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Abstract 

Belief Propagation algorithms acting on Graphical Models of classical 
probability distributions, such as Markov Networks, Factor Graphs and 
Bayesian Networks, are amongst the most powerful known methods for 
deriving probabilistic inferences amongst large numbers of random vari- 
ables. This paper presents a generalization of these concepts and meth- 
ods to the quantum case, based on the idea that quantum theory can be 
thought of as a noncommutative, operator-valued, generalization of clas- 
sical probability theory. Some novel characterizations of quantum condi- 
tional independence are derived, and definitions of Quantum n-Bifactor 
Networks, Markov Networks, Factor Graphs and Bayesian Networks are 
proposed. The structure of Quantum Markov Networks is investigated 
and some partial characterization results are obtained, along the lines of 
the Hammersely-Clifford theorem. A Quantum Belief Propagation algo- 
rithm is presented and is shown to converge on 1-Bifactor Networks and 
Markov Networks when the underlying graph is a tree. The use of Quan- 
tum Belief Propagation as a heuristic algorithm in cases where it is not 
known to converge is discussed. Applications to decoding quantum error 
correcting codes and to the simulation of many-body quantum systems 
are described. 

1 Introduction 

Quantum theory is first and foremost a calculus for computing the probabilities 
of outcomes of measurements made on physical systems. Therefore, the generic 
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problem in quantum theory is one of probabilistic inference, i.e. given a spec- 
ified class of quantum states, compute the predicted probabilities of measure- 
ment outcomes and their correlations. For example, computing the correlation 
functions of a system in the ground state of a Hamiltonian, or computing the 
probabilities for the possible measurement outcomes after implementing a quan- 
tum circuit, are problems of this general type. Such quantum inferences present 
a formidable computational challenge as the number of subsystems becomes 
large, since the number of parameters needed to specify a quantum state grows 
exponentially with the number of subsystems, and the formulas for quantities 
of interest typically also involve an exponentially large number of terms. 

A similar problem arises in classical probabilistic inference, since the num- 
ber of terms required to specify a general probability distribution also grows 
exponentially with the number of random variables involved. A variety of algo- 
rithms for classical probabilistic inference have been discovered, of which Belief 
Propagation algorithms on Graphical Models are amongst the most powerful. 
Such algorithms are particularly interesting for two reasons. Firstly, they are 
highly parallelizable in the sense that they can be implemented by associating 
each random variable with a separate processor. Messages are received and 
sent by the processors along the links of a network corresponding to the edges 
of a graph and, importantly, the order in which the messages arrive does not 
matter. Secondly, Belief Propagation performs remarkably well as a heuristic 
algorithm, even in cases where it is not guaranteed to converge to the exact 
solution. Important examples include the near optimal decoding of low density 
|Gal63a| and turbo [BGT93a] error correction codes, spin glass models |MP01aj , 
and random satisfiability problems [MPZ02£i] . Understanding the reasons for 
this is currently an active area of research, but it is understood ^YedOlaj to be 
related to a hierarchy of approximation schemes commonly used in statistical 
physics. 

Due to the similarity between the classical and quantum problems, one might 
hope to leverage the power of Belief Propagation in the quantum case also, espe- 
cially since quantum theory can be regarded as a noncommutative generalization 
of classical probability theory. This is indeed the case, and in this paper we de- 
velop the necessary theory of Quantum Belief Propagation and its associated 
Graphical Models. 

This paper should be of interest to researchers in Graphical Models and Belief 
Propagation, as well as to researchers in quantum theory, particularly in quan- 
tum information and the simulation of quantum many-body systems. As such, 
it is intended to be as self-contained as possible, although we do assume famil- 
iarity with the basic formalism of quantum theory on finite dimensional Hilbert 
spaces, including the theory of density matrices, generalized measurements and 
completely positive maps, as used in quantum information theory. These are 
covered in detail in the textbook of Nielsen and Chuang [NC00a| , as well as in 
Preskill's lecture notes |Pre99b| . For further background on classical Graphi- 
cal Models and Belief Propagation, we suggest the texts of Lauritzen [Lau96a| , 
MacKay [Mac03aj , and Neapolitan [Nea90a[ INea04a| , as well as the review ar- 
ticles by Yedida et al. |Yed01a( lYFW02a| and Aji and McEHece |AM00a| . 
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The remainder of this paper is structured as follows. In |2J the generic 
classical and quantum probabilistic inference problems are defined. In fj3l we 
review the notions of classical and quantum conditional independence, which 
are crucial for the development of Graphical Models and Belief Propagation 
algorithms, ij3.ll outlines the entropic approach to conditional independence 
based on the vanishing of conditional mutual information and the associated 
constraints on conditional and mutual probability distributions. This entropic 
approach has a straightforward quantum generalization based on the equality 
conditions for strong-subadditivity, which is described in H3.2I t j3.3l introduces 
the quantum conditional and mutual density operators, which are analogous 
to classical conditional and mutual probability distributions, and N3.4I explains 
how quantum conditional independence can be characterized directly in terms 
of them. 

In iJU we develop the theory of quantum Graphical Models. §4.11 reviews 
the definition of classical Markov Networks and the Hammersley- Clifford the- 
orem, which gives an explicit representation of the probability distributions 
supported on them. Motivated by this, H4.2I defines the class of quantum n- 
Bifactor Networks, which are the most general class of networks on which our 
Belief Propagation algorithms operate. ij4.3l reviews the theory of dependency 
models and graphoids, which are abstractions of the conditional independence 
relation, and a quantum graphoid is defined based on quantum conditional inde- 
pendence. t j4.4l uses the quantum graphoid to define quantum Markov Networks 
and gives some partial characterization theorems, along the lines of the classical 
Hammersley- Clifford theorem, which connect quantum Markov Networks to n- 
Bifactor Networks. §4.51 briefly discusses quantum generalizations of two other 
classical Graphical Models: Factor Graphs and Bayesian Networks. Figure [9] 
sketches the relation between some of these Graphical Models, and summarizes 
the Quantum Belief Propagation algorithm's domain of convergence. 

fj5] discusses the Quantum Belief Propagation algorithms. In H5.1[ QBP 
algorithms are described for n-Bifactor Networks. In H5.'2l QPB is shown to 
converge for 1-Bifactor Networks on trees and for general Bifactor Networks on 
trees that are also Quantum Markov Networks, ^discusses some methods for 
using QBP as a heuristic algorithm in cases where it is not known to converge. 
These are coarse graining i )6.11 sliding window QBP i)6.2l and the method of 
replicas §6.31 

|J7] presents two applications of QBP: to decoding quantum error correct- 
ing codes in §7.11 and to simulating many-body quantum systems in §7.21 In 
particular, i i7.2l explains how projected entangled-pair states, which have been 
successfully used in statistical physics as approximations to the ground states of 
a wide class of Hamiltonians, can be incorporated into the framework of Bifactor 
Networks. 

To conclude, discusses the relationship to other quantum generalizations 
of Graphical Models and Belief Propagation that have been proposed and ^ 
describes open questions and future research directions suggested by this work. 

Note that a slightly unconventional notation for probability distributions on 
sets of random variables and for quantum states on tensor products of quantum 
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systems is used throughout. This is very convenient for describing Graphical 
Models and is reviewed in appendix 1X1 



2 Classical and Quantum Probabilistic Inference 

Classical Graphical Models are designed to be used as tools for making proba- 
bilistic inferences amongst large numbers of correlated random variables. Con- 
sider a set random variables, V — {vi,V2, ■ ■ ■ ,vn}, each of which takes a fi- 
nite number of integer values {1,2, ...d}. To specify a general probability 
distribution, P{V), over the variables requires 0{d^) parameters. On learn- 
ing that some subset of the variables U C V take particular values, denoted 
U = {u = ju}ueUi an important task is to update the probability for some 
other disjoint subset of variables W CV via Bayes rule 

P(W\U) = ^(^^^) = Ev-.uuw^PiU^iV-U)) 
P{U) Ev-uPiUUiV-U)) 

This immediately raises two problems. Firstly, the number of parameters needed 
to specify the input to the computation, i.e. the probability distribution itself, is 
exponential in TV. We would like to specify a well-defined computational problem 
in which N measures the input size. Therefore, it is not feasible to consider the 
full set of probability distributions over N variables, and attention must be 
restricted to families of distributions that can be specified with a number of 
parameters that grows only polynomially in N. Secondly, assuming that the 
sizes of U and W are held constant as N increases, eq. ^ involves sums over a 
number of terms that is exponential in N. Thus, a straightforward evaluation of 
the formula would not give an efficient algorithm. The restriction on the class of 
probability distributions must somehow be used to find an alternative method 
of computation that is efficient. 

Classical Graphical Models are designed to provide an efficient representation 
of classes of probability distributions and Belief Propagation algorithms are 
designed to solve the corresponding inference problem. 

In quantum theory, the random variables are replaced by a set of N quantum 
systems V — {wi, W2, • ■ • , vn}, each associated with a Hilbert space of dimension 
d. Again, it takes an exponential in N number of parameters to specify a general 
density operator pv- The analog of the inference in eq. ([T]) is to perform a 

positive operator valued measure (POVM) I^E^-*! on a subsystem U <^V and, 

on obtaining outcome j, update the state of some disjoint subsystem W C V 
according to 



Tr, 



It should be noted that this quantum problem reduces to the classical case 
when all the operators involved commute and are diagonal in a product basis 
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of the systems in V. In this sense eq. ([5]) is a noncommutative generahzation 
of eq. Il]) and this correspondence provides the guiding principle that we use to 
generahze the classical theory. 

The quantum problem raises the same sort of issues as in the classical case, 
since it takes an exponential in N number of parameters to specify a state on 
A'' subsystems and the trace and partial trace in eq. ^ involve sums over an 
exponential number of terms. In quantum many-body theory, physical con- 
siderations are often used to motivate solutions to the representation problem, 
e.g. we may restrict attention to the ground or Gibbs states of some class of 
efficiently specifiable Hamiltonians. In this paper, we take a different approach 
and instead generalize the sort of constraints that are used in defining classical 
Graphical Models. The reasons for this are twofold. Firstly, with the advent 
of quantum information science, it is relevant to solve instances of eq. ([2]) that 
are of broader scope than those typically considered in statistical physics. For 
example, we may be interested in states that are the output of a class of poly- 
nomial quantum circuits, or in the code states of a quantum error correction 
code. The most natural way to phrase such constraints is not always in terms 
of Hamiltonians, although it may be possible to do so. Secondly, by focussing 
on constraints with a clear probabilistic and information theoretic meaning, the 
connection between the classical and quantum problems is elucidated and the 
results of the vast literature on the classical inference problem can be called into 
play. 



3 Conditional Independence 

The formal construction of classical Graphical Models is based on the idea of 
placing conditional independence constraints on sets of random variables. In 
this section, the relevant classical definitions are reviewed and their quantum 
generalizations are introduced. In tJ3.11 the entropic approach to conditional 
independence is outlined and the corresponding constraints on conditional and 
mutual probability distributions are reviewed. In i )3.2[ the entropic definition 
is straightforwardly generalized to the quantum case by replacing the Shannon 
entropy with the von Neumann entropy. In order to provide constraints on den- 
sity operators that are analogous to those for classical conditional and mutual 
probability distributions, conditional and mutual density operators are defined 
in H3.3l and quantum conditional independence is expressed in terms of them in 



3.1 Classical Conditional Independence 

For a set V of classical random variables with joint distribution P{V), the 
marginal distribution for any J7 C 1/ is defined as P{U) = J2v-u ^(^) ^'^^ 
any two disjoint sets U,W ^ V , the conditional distribution of U given W is 
defined as 

n^l-) - ™ (3, 
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The Shannon entropy of any U CV is defined as 

^(C^) = -E^(^)log2^(^) (4) 
u 

For disjoint U,W CV, the conditional entropy of U given W is defined as 

H{U\W) = - ^ P{U U W) log2 P{U\W), (5) 

and satisfies the identity 

H{U\W) = H{UUW) - H{W). (6) 

The mutual information between U and W is defined to be 

H{U -.W) ^ H{U) - H{U\W) (7) 
= i/(C/) + 77(Ty)-i?(C/UTy). (8) 

Note that H{U : W) = iS P{U \JW) = P{U)P{W). For three disjoint sets 
U,W,X C V, the conditional mutual information between U and W, given X 
is defined to be 

H{U -.WlX) = H{U\X) - H{U\WUX) (9) 
= H{U UX) + H(W UX)- H{X) - H{U \JW\JX). (10) 

The condition H(U : W\X) = is known as conditional independence of U and 
W given X and it is equivalent to any of the following conditions 

P{U\W [J X) ^ P{U\X) (11) 

P{W\UiJX)=P{W\X) (12) 

P{U \JW\X) = P{U\X)P{W\X) (13) 

P{U[JW[JX)^P{U\X)P{W\X)P{X). (14) 

Example 3.1. Consider a Markov chain consisting of three random variables 
u ~ X ~ w. The defining condition for such a process is that u and w are 
conditionally independent given x. Thus, eq. p4p immediately implies that the 
joint probability distribution has the form 

P{u, X, w) = P{u\x)P{w\x)P{x). (15) 

In general, a joint distribution of three variables can be written as P{u,x,w) = 
P{w\u,x)P{x\u)P{u) = P{u\x,w)P{x\w)P{w) and so eqs. ill]) and hl2\l imply 
that P{u, X, w) can also be written as 

P(u, X, w) = P{w\x)P{x\u)P{u) (16) 
P{u, X, w) = P{u\x)P{x\w)P{w). (17) 
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The three equivalent decompositions given in eqs. I115\ - \17\ l are suggestive of 
three different types of causal scenario that might give rise to such a Markov 
chain: 



(|_?5p suggests x is a common cause of u and w : u x w 
(j J6'P suggests u causes x and then x causes w : u ^ x ^ w 
( | j7p suggests w causes x and then x causes u : u x ^ w. 

The common feature of these three scenarios is that in each case all the correla- 
tions between u and w are mediated by x. Ultimately, conditional independence 
captures this common feature rather than implying any specific causal scenario. 

The example shows that care should be taken when interpreting a decompo- 
sition of a joint probability distribution into conditional and marginal distribu- 
tions. Conditional independence is about the structure of correlations between 
random variables rather than their specific causal relations. For this reason it 
is often useful to replace conditional probabilities with an object that is more 
closely connected with correlation. 

The mutual probability distribution of disjoint U,W QV is given by 

• ~ p{u)p{w) ~ -TW 

As the name implies, this is related to the mutual information and it is easy to 
check that eq. ([7]) can be rewritten as 

H{U ■.V)=Y^ P{U, W) log2 P(t7 : W). (19) 
uuw 

The conditional independence conditions eqs. ()1HI14|) can be re-expressed in 
terms of mutual distributions as 

P{U : W U X) ^ P{U : X) (20) 

P{W : U U X) ^ P{W : X) (21) 

P{U UW : X) = P{U : X)P{W : X) (22) 

P{U UWUX) = P{U : X)P{W : X)P{X)P{U)P{W). (23) 

Example 3.2. Returning to the Markov chain of examvle \8.1[ the decomposi- 
tions eqs. \15\[1'T^ can all be rewritten in terms of mutual distributions by re- 
placing each conditional probability with the product of a marginal and a mutual 
distribution using the relation P{U\W) = P{U : W)P{U). All three decomposi- 
tions reduce to the same expression: 

P{u, X, w) = P{u)P{x)P{w)P{u : x)P{x : w). (24) 

This decomposition clearly shows that all correlations between u and w are me- 
diated by X and avoids the causal ambiguities that are implicit in the use of 
conditional probabilities. 
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3.2 Quantum Conditional Independence 

Turning now to the quantum case, if y is a set of subsystems then the joint 
state is a density operator py . For C/ C V^, the analog of a marginal distribution 
is the reduced state obtained by taking a partial trace over V — U, i.e. pu = 
Ttv-u (pv)- The Shannon entropy is replaced by the von Neumann entropy, 
defined as 

S{U) = -TT{pu\og2Pu). (25) 

Quantum analogs of conditional and mutual probability distributions are not 
commonly discussed in the literature, but they are needed to obtain decomposi- 
tions of the joint density operator analogous to eqs. (|lHll4p and eqs. (|20ll23p . so 
they are introduced in the next section. For now, note that the quantum con- 
ditional entropy, mutual information and conditional mutual information can 
already be defined by simply replacing H with 5* in the expressions ([6]), ([8]) 
and (|10p . since these expressions only involve joint and marginal probability 
distributions. 

By comparison with the classical case, it is natural to consider S{U : W\X) = 
as a definition of quantum conditional independence. In fact, the inequality 
S{U : > always holds and is known as strong subadditivity, so quantum 

conditional independence is simply the equality condition for strong subaddi- 
tivity. This equality condition has been investigated extensively and has been 
shown |HJPW03a] to be equivalent to the existence a decomposition of the 
Hilbert space Hx of the form 

d 

= {'^x^ ^ Uxn) , (26) 

(the superscripts L and R indicate the left and right sector of the tensor product) 
such that the joint density operator puvjw^jx can be written as 

d 

Puuwux = y^PjO"[/xf- ® '''x^wj (27) 

where < < 1, — 1' ^^'^ ^ux'^ ^'^'^ '''x'^w density operators on 

Ti-u ® 'Hx'- ^-iid ® Hw respectively. 

Less explicit formulations of the equality condition have also been found 
|Rus02b| . such as the operator equahty 

log Puwx + log px = log pux + log pwx , (28) 
where the logarithms are restricted to the supports of the operators. 

3.3 Conditional and Mutual Density Operators 

Quantum conditional independence can be expressed in a form closer to the 
classical conditions eqs. (I1HI14P and (|20ll23p by introducing definitions of condi- 
tional and mutual density operators. For this purpose, it is convenient to define 
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a family of products for pairs of operators A, B as follows. 

B = (a^^S^At^)" (29) 

An important property of the ★'■"^ products is that if A and B are both positive 
operators then A 7k-(") B is also positive. In what follows, the most frequently 
used of these products are A-kB^ A B and 

AqB^ hm (a b] . (30) 

n — >oo \ / 

Note that whilst is commutative and associative, is neither in general, 
so particular attention must be paid to the ordering of operators. 

The product was previously introduced in [WarOSa], in the context of a 
Bayesian calculus for quantum theory, and it satisfies the formula 

A B = exp (log A + log B) , (31) 

whenever A and B are strictly positive. If A and B are semi-positive, then 
eq. ([?!]) may be extended by restricting the action of the logarithm to the 
supports of the operators. 

The products can be used to define a family of conditional density 
operators. Let ^ be a set of quantum systems in a state pv and let U,W 'Z V 
be disjoint. Define 

Puiw^Pw*^''^ Puuw, (32) 

where ~^ denotes the Moore-Penrose pseudoinverse0. Note that if = 0, so 
that Tiw = C is the trivial Hilbert space, then p^^^^Y = pu- The conditional 

density operators used most frequently in this paper are pij\w — Pu\w ^^'^ 

Pu\w = Pw ® Puuw- 

The operator p'"^^ was originally introduced |CA97a| because it allows the 
quantum conditional entropy to be expressed via a formula analogous to eq. ([5]) 

S{U\W) - -Tr (^puuw log2 P^^l) ■ (33) 

The operator pu\w was introduced in [ Lei06a[ [Lei06b[ IAKMS06"a] and also ex- 
hibits strong analogies with classical conditional probability. 

The corresponding family of mutual density operators is defined similarly via 

Pu:w = (Pu' ® Pw) PUUW = Pu' pSIv, (34) 

with Pi^y[r and pu-.w defined in the obvious way. 

The operator p^^-^ was introduced |CA97a| in order to express the quantum 
mutual information via a formula analogous to eq. (jl9p 

S{U : W) - -Tr (^puuw \og^ p'^^l) ■ (35) 

^ In the present case this means that is the inverse of pw when restricted to the support 
of pvv has the same null space as pyy- 
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3.4 Constraints on Conditional and Mutual Density Op- 
erators 

In this section, quantum conditional independence is shown to be equivalent to 
constraints on conditional and mutual density operators analogous to eqs. PH 
[HI) and eqs. (^011^ . 

Theorem 3.3. If S{U : = then the following conditions hold: 

Puixuw-Puix^Pw (36) 
P^w]xuu=Pw]x®Pu (37) 

(n) (n) (n) f}.Q\ 

Puuw\x — Pu\xPw\x \'^°) 
Puuwux = Px {pu\xPw\x) ' (39) 

where Pw is the projector onto the support of pw o,nd Pjj is the projector onto 
the support of pu . 

Proof. These conditions are a direct consequence of the decomposition given in 

eq. (1271) . Since each Tiv-i is a factor in a direct sum decomposition of Tix, it 

j 

follows that the operators cfjjx'^ have disjoint support. Similarly, the operators 
TjyjsfH have disjoint support. Hence, to prove eq. (|36p note that 

d 

pwvjx = y^^Pj(^x^ ® '^x'^w^ (40) 



and hence 



Pu\wvjx = Pwx Puwx (41) 
d 

= E {^x] ^uxf ) ® {r-i^ r^fw) (42) 



d 



H^'mx-^Px-w (43) 



.(") 



= p'u{x®Pw, (44) 

where Pxrw the projector onto the support of Txrw 
j ^ J 

Eqs. ([57)1 and ([55)) are proved similarly, with the proviso that the decomposi- 
tion given in eq. (|27p implies that p^^^^x ^^"^ /^ivix commute, which is necessary 
to prove eq. ([55|l. Finally, ([59[) is equivalent to ([55[) via the definition a condi- 
tional density operator. □ 

It is straightforward to adapt the proof in order to arrive at analogous de- 
compositions in terms of mutual density operators. 
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Theorem 3.4. If S{U : W^|X) — then the following conditions hold: 



Pu-xuw = Pu:x ® Pw (45) 

Pw:xuu = Pw:x ® Pu (46) 

o^") - o'"^ o'"^ ('471 

Puuwux = {pu ® Pw® Px) {pij}xPvv:x) ' (^8) 



It remains to determine whether any converse impUcations hold, i.e. which 
of the conditions eqs. and (liSlHg)) imply that S{U : W\X) = 0. For 

this purpose, it is only necessary to consider eqs. p6ll38p because eqs. (|45ll48p 
are equivalent to eqs. (|36ll39p via the definition of a mutual density operator 
and eq. ([5^ is equivalent to eq. ([55]) via the definition of a conditional density 
operator. In general, the situation appears to be more complicated than in the 
classical case and we are only able to obtain tight converse results for the cases 
n — > cxD and n = 1. 

Theorem 3.5. In the limit, n — ^ oo, all the converse implications hold, i.e. 
any of the conditions IgglT?^] imply that S{U : W\X) = 0. 

Proof. These results are simple consequences of the equality condition given in 
eq. JMl). For eq. ^ we have 

Pwux Puuwux = Px^ Puux ■ (49) 

Using eq. |3T|) gives 

exp {log Puuwux - logpwux) = exp {puux - Px) ■ (50) 

Taking logarithms and rearranging gives eq. The proofs for eqs. ([57|) and 

(|38|) follow by similar arguments. □ 

For the n = \ case, eqs. ([36|) and (|37l) imply converse results. 

Theorem 3.6. If pu\xuw = Pu\x or pw\xuu = Pw\x then S{U : W\X) = 0. 

Proof. As explained in [H JP W03a] . Uhlman's theorem |Uhl77aj . implies that 
S{U : W\X) = iflF there exists a trace preserving, completely positive map 
£uuxuw\uux ■ ^{K-u K-x) S,{TLu ® V-x Wh'), such that both 

£uuxuw\uux{pu ® Px) = Pu ® Pxuw (51) 
£uuxuw\uux{puux) = Puuxuw (52) 

hold simultaneously. In the present case, this can be achieved via a map of the 
form £uuxuw\uux = I^u ® ^xuw\Xi where lu is the identity superoperator on 
2{Hu) and !Fxuw\x ■ ^{Tix) — > ^{'Hx ® Hw) is a trace preserving completely 
positive map. J-'xuw\x is defined via a Kraus representation J^xuw\x{'^x) = 
M^xuwix^xMi^^J^^^, where 

^^xuw\x ^ Pxuw \^)w Px^ ' i^'^) 
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and are basis vectors for Hw 

It is straightforward to check that ^^xuw\x^xljw\x ~ where Px 
is the projector onto the support of px- This can easily be extended to be a 
trace preserving map by adding an extra Kraus operator that has support only 
in the subspace orthogonal to the support of px , but this can be omitted for the 
present purpose since it doesn't change the action of £uuxuw\uux on pjj (g) px 
or puux- It is straightforward to check that J^xuw\x{px) = Pxuw, so the first 
condition is satisfied. The action on puux is given by 

{puux) = Y,^u® M^xuwixPuuxIu ® A^xuwix (^4) 

3 

= Pxuw \j)w UlwPx^PuuxPx^Pxuw (55) 

3 

1 2. 

= PxuwPu\xPxuw (56) 

By assumption, Pu\xuw = Pu\x, so it follows that puuxuw = PxuwPu\xPxuw^ 
as required. The result for pw\xuu = Pw\x follows by symmetry. □ 



For n < oo, it is not true that ()38|) implies conditional independence, even 
in the case n = 1. This is illustrated by the following counterexample. 

Example 3.7. Let U and W be single qubits, and X be composed of two qubits 
labeled and X^. For e > 0, consider the normalized state 

puuxuw - 7T^—-^-rTpx (Puux^ ® ^wux'^) ' (57) 

(1 -e)- +3(e/3)™ V / 

whc' 

= (^ ~ APz, ^ 3 



px = {l~ ^^)Px-ux^ + ^Px^ux^ (58) 



and where P^^Jg denote the projector onto the symmetric and anti- symmetric 
subspaces ofTiA ®'Hb- The conditional states are 

2 

Pu\x = I , , Puux^ ® Ix'^ and (59) 

^(1-6)^+3(6/3)^ 

Pw\x = ^===^=-fx^ ® Pwuxn ■ (60) 

^(1-6)^+3(6/3)^ 

By construction, condition (ISS)) is easily verified puuxuW = Px*''"Hp[/|js:Pw-|x)- 

In the limit e 0, the state puuxuW Puuw ® Px^ux'^' has S{U : 

W\X) — 2. By continuity, we claim that for all n < oo, there exists an e > 
such that Puuxuw is a density operator that does not saturate strong subaddi- 
tivity. 
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The preceding example shows that some of the conditions given in eqs. ([551 
I39[) are not sufficient to imply quantum conditional independence on their own. 
Therefore, additional constraints need to be imposed in order to obtain converse 
results. Two alternative approaches are considered here, one based on additional 
commutation conditions that hold for conditionally independent states and one 
based on the algebraic structure of such states. The approach based on commu- 
tation conditions is perhaps more elegant, but the algebraic conditions are also 
relevant because they are used in theorem 14.111 in ii4.4l to provide a character- 
ization result for quantum Markov Networks on trees. The following sequence 
of results provides the approach based on commutation conditions. 

i_ 1 

Theorem 3.8. For a fixed n, if Px^" Pmjx '^"■'^ '^•^ o,djoint commute with 
Px^" Pwux ' ihen the conditions given in eqs. \30^39j) are all equivalent. 

Proof. We start by showing that p'^^^y/ux ~ P^u\x equivalent to pj^j^/ux ~ 
'^Ivjjf ■ iivst of these can be written explicitly in terms of joint and reduced 
density operators as 

i_ i_ 

PwGxPuvwijxP^njx — Px^" PuuxPx^" ■ (^l) 

1 

Left and right multiplying by p^^JX gives 

j_ 1 1 

Puuwux — PwuxPx^" PuuxPx^" Pwux- i^^) 

J- -J- J_ i , 

Now, define T — P\y\jxPx^" Puux ^^^^ Puuwux ~ TT'. In a similar fashion, 

p'wiuux ~ Pw\x can be shown to be equivalent to Pijdwux ~ T^T. 
Now, 

^ PuuxPx^" Pwux (63) 

1 ]^ 1 1 

= Px Px'" PuuxPx'" Pwux (64) 

1 ]_ J_ 1 1 

= Px Px'" PwuxPx'" Puux (65) 

1 1 1 

= PwuxPx^'^Puux (66) 
= T, (67) 

1^ j_ i_ 

where the assumption that p^^" Puux commutes with p^^" Pwux has been used 
to derive eq. I|65p . Hence, T is Hermitian and the two conditions are equivalent. 



For the remaining condition note that P^^uw\x ~ PLr\xPw]x equivalent to 

Puuwux — Puux Px" Pwux (68) 

1 1 i_ 1 

— PiFuxPuuxPx'" Px'" Pwux Pwux (69) 
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The commutativity of PijlsxPx^" ^'^'^ Px^" Pwux then gives 



Puuwvx — PinjxPx^" PwLixPhlixPx^" Px^" PwLiX C^'^) 
^ Z'lc" Px^" PulixPx^" PwuxPulixPx^" Px^" Pwux ' C^^) 



and the commutativity of Px^" Puux ^'^'^ Px^" Pwvjx gives 

j_ 1^ j_ i_ j_ j_ i_ i_ j_ 

Pc/uvFux ~ Px^"^ Px^" PwuxPx^" PulixPuuxPx^" Px^" Pwux C''^) 
1 i_ j_ 1 1 

— PwuxPx^" PuuxPx'^ PwuX' ('''3) 
which is equivalent to = PLr\x' '-' 



Theorem |331 relates the conditions eqs. p6ll38p for a fixed value of n, but the 
conditions for different values of n can also be related via the following corollary. 

i_ j_ 1 1 

Corollary 3.9. For fixed n, if Px^" Puux ^"■^ '^^^ adjoint commute with Px^" PwuX' 

, (n) _ (n) . (2n) _ (2n) (2n) 

men p^n^ux ~ Pt/|x ^i^Pi-^es Pmji^^x ~ Pu\xPw\x- 

Proof. In the preceding proof it was shown that P^^^l^/ux ~ PiJ\x equivalent 
to PlJ^JWux = where T = PwuxPx^" Puux^ ^^'^ that the commutativity 

— / 2 

conditions imply that T is Hermitian. Therefore, pfjuiYux ~ v^^) ' which 
imphes p^uwux = PuuxPx^" Pwux- The latter is straightforwardly 

equivalent to p'^J^l^^x = ^c/lx^l^Tx 1^ 

Putting these results together leads to a set necessary and suHicient condition 
for conditional independence. 

1^ j_ 1 1 

Corollary 3.10. If Px^" PiTux '^'^'^ adjoint commute with Px^"^ Pwux f^'"' 
every n, then any of the conditions given in eqs. i30i38\) imply that S{U : 
W\X) = 0. 



Proof. Under these commutativity conditions, theorem 13.81 implies that eqs. 
P6II38I) are equivalent for any fixed m and corollarv 13.91 shows that p'i^^^^x ~ 

Pu\x Pw\x '^^^^ derived from ~ PLr\x- applying theorem 13 . 81 with 

n = 2m, it follows that p^^^^jx = Pu\x implies Puywux = Pu\x ■ induction, 

this implies that P^^-^^x — Pu\x^ ^'^^ positive integer s. Taking the limit 

s — + oo gives p[m^ux ~ Pinx^ which implies S{U : W\X) = by theorem 

[SH □ 

We now turn to the algebraic approach to proving converse results. Firstly, 
note that eq. (|38p implies that p\^^x ^^'^ Pw\x commute, since P^Tuivix Hermi- 
tian. It can be shown that whenever two operators Ajjyjx ®Iw a-nd Ijj ® Bwvjx 
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commute there exists a decomposition of Hx as in eq. (pS)) such that 

d 

Aux = ^ o.jjx'^ ® and (74) 

(75) 

i=i 

so eq. ([55)1 imphes that and have this structure, as would be ex- 

pected if the joint state is conditionally independent and hence satisfies eq. (P7|) . 
However, eq. (1271) implies an additional constraint that has not been used so 
far, namely that px also respects the same tensor product structure on Tix, i-e. 
Px is of the form 

d 

Px^^ Pj cTxf ® TxH . (76) 

More generally, we will say that an operator Cx is decomposable with respect to 
the pair of commuting operators Ajjux and Bwux if it has the same algebraic 
structure on 7ix, i-c. if 

d 

C'x ^J2^x^(ScxR. (77) 
i=i 

for some factorization of Hx, such that eqs. ((74|) and (|75p hold. Imposing 
the commutativity of p^^^x ^^d pj^l^f , along with the decomposability of px 
with respect to p^^|x ^^d pj^j^ as additional constraints is enough to straight- 
forwardly show that any of eqs. p6ll39p imply conditional independence for all 
values of n. 



4 Graphical Models 

In this section, quantum conditional independence is used to define quantum 
Graphical Models that generalize their classical counterparts. The main focus 
is on quantum Markov Networks and n-Bifactor Networks, since these allow 
for the simplest formulation of the Belief Propagation algorithms to be de- 
scribed in SJSl tj4.1l reviews the definition of classical Markov Networks and 
the Hammersley-Clifford theorem, which gives an explicit representation for the 
probability distributions associated with classical Markov Networks. Motivated 
by this, ij4.2l defines the class of quantum n-Bifactor Networks, which are the 
most general class of networks on which our Belief Propagation algorithms op- 
erate. N4.3I reviews the theory of dependency models and graphoids, which is 
useful for proving theorems about Graphical Models, and shows that quantum 
conditional independence can be used to define a graphoid. H4.4I defines quan- 
tum Markov Networks and gives some partial characterization results for the 
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Figure 1: The equalities H{a : dU e U f\b U c) = 0, H{f : aUbUcU d\e) = 0, 
and H{aUb : eUf\cUd) = are examples of constraints that are satisfied when 
(G, P{V)) is a Markov Network. 

associated quantum states, along similar lines to the Hammersley-Chfford the- 
orem. Most of these definitions and characterization results are summarized on 
Fig. El 

The remaining two subsections briefly outline two other quantum Graphical 
Models: Quantum Factor Graphs in iJ4.5.1l and Quantum Bayesian Networks in 
N4.5.2I These structures are equivalent from the point of view of the efHciency 
of Belief Propagation algorithms, since it is always possible to convert them 
into n-Bifactor Networks and vice-versa with only a linear overhead in graph 
size. An explicit method for converting a quantum factor graph into a quantum 
1-Bifactor Network is given because factor graphs are used in the application to 
quantum error correction developed in §7.11 

4.1 Classical Markov Networks 

Let G — (V, E) be an undirected graph and suppose that each vertex v € V 
is associated with a random variable, also denoted v. Let P{V) be the joint 
distribution of the variables. (G, P{V)) is a Classical Markov Network if for all 
U CV, H{U : V - {n{U) U U)\n{U)) = 0, where n{U) is the set of nearest 
neighbors of L/ in G (see Fig. [1]). Further, if P{V) is strictly positive for all 
possible valuations of the variables, then (G, P{V)) is called a Positive Classical 
Markov Network. For such positive networks there is a powerful characterization 
theorem |Gri73a[ IBes74aj . 

Theorem 4.1 (Hammersley-ClifFord |HC71a) ). {G,P{V)) is a positive classical 
Markov network iff it can be written as 

p{v) = I n ^(^)' (78) 

where £ is the set of cliques of G, '>Jj{C) is a positive function defined on the 
random variables in G and Z is a normalization factor. 

A set of vertices G C in a graph is a clique if Vw, v & C , u ^ v {u, v) E E , 
i.e. every vertex in G is connected to every other vertex in G by an edge. 
Note that the decomposition in eq. (jTS]) is generally not unique, even up to 
normalization. A distribution of the form of eq. ([78)) is said to factorize with 
respect to the graph G. 
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Markov chains are a special case of Markov Networks in which the graph is 
a chain. These are included in the slightly more general class of networks where 
the graph is a tree. For trees the only cliques are the individual vertices and the 
pairs of vertices that are connected by an edge, and the associated probability 
distributions have a representation in terms of marginal and mutual probability 
distributions of the form 

piv) = n ^(^) n ■■ (79) 

vev {u,v)eE 

which generalizes the decomposition for three variable Markov chain given in 
eq. For more general networks wherein the graph has cycles, there is no 

Hammersley-Clifford decomposition in which the functions tpiC) are marginal 
and mutual probability distributions. 

The Hammersley-Clifford decomposition can be put in a form more familiar 
to physicists by introducing a positive constant /3 and defining the functions 
H{G) = log '0(C), which are always well defined since ipiC) is positive. 

Then eq. (|75|) can be written as 

F(y) = lexp(-/3^iJ(C)J , (80) 
\ cec / 

which is a Gibbs state for a system with a Hamiltonian X^cee -^(^) ^'^'^ P^^" 
tition function Z . This is a generalization of the lattice models studied in sta- 
tistical physics to arbitrary graphs. Indeed, if G is a lattice, then, as for trees, 
the only cliques are the individual vertices and pairs of vertices connected by 
an edge, so for lattices the edges represent local nearest-neighbor interactions. 

In many applications, such as in statistical physics, the functions ^{C) are 
often constants for cliques containing three or more vertices even in the case 
where the graph has cliques with more than two vertices. In this case, we again 
have that the only nontrivial functions are defined on the vertices and edges of 
the graph, so the state can be written as 

P{V) = ^X{^{v) n V'(":«). (81) 

vev {u,v)eE 

Here, the edge functions are denoted iplu : v) because of the close parallel with 
eq. ([7^ . but they are general positive functions rather than mutual distribu- 
tions. We adopt the terminology bifactor distribution to describe distributions 
of the form of eq. ((8T|) and Bifactor Network for the pair {G,P{V)). For ex- 
ample, the distribution associated with a local nearest-neighbor model on an 
arbitrary graph, such as the spin-glasses studied in statistical physics, would be 
a bifactor distribution. 

4.2 Quantum Bifactor Networks 

A proper generalization of Markov Networks to quantum theory involves the 
replacement of random variables with quantum systems and the replacement of 
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classical conditional independence with its quantum counterpart. This theory 
is developed in the following sections, but it is convenient to first introduce a 
class of states that parallels the classical bifactor distributions of eq. ([5T|) . 

Let G = {y, E) be a graph, let each vertex v he associated to a quantum 
system with Hilbert space Hv- Let Tiv = ®vev ^^'^ consider the class of 
states pv that can be expressed as 




where Z is normalization constant, the /i^'s are operators on 7Y„ and the 
i^v.w = Vw.v are operators on Ji^ ®T-Lw As stated, this expression is ambiguous 
because the product is neither commutative or associative apart from in 
the limit n — > oo. To avoid this ambiguity we impose the additional constraint 
that \ 

^u:vj ^w.x] — for finite tx, in which case the expression (*^^^^)^^ w)^e ^^-^ 
reduces to J^^^ w)&e ^^-w- The state pv is an n-hif actor state if it can be written 




with [vu:vi Vw.x] — 0, and it is an oo-bifactor state if it can be written as 




with no commutativity constraint on the Uy-.w The pair (G,pv) is referred to 
as a quantum n-Bifactor Network, or oo-Bifactor Network, respectively. 

It turns out that not every quantum Bifactor Network is a quantum Markov 
Network, but the quantum generalizations of Belief Propagation algorithms to 
be developed in ^Sl can be formulated for any Bifactor Network. Therefore, 
readers who are mainly interested in algorithms and applications rather than 
proofs can skip to fj5l perhaps pausing to read t|4.5.1l on the way in order to 
understand the application to quantum error correction. 

The next goal is to formulate the theory of quantum Markov Networks and 
provide characterization theorems analogous to the Hammersley- Clifford the- 
orem. In order to do so it is convenient to first introduce the theory of de- 
pendency models and graphoids, which is useful for proving theorems about 
Graphical Models. 

4.3 Dependency Models and Graphoids 

Graphs and conditional independence relations share a number of important 
properties that are responsible for the structure of Graphical Models. These 
properties are also shared by a number of other mathematical structures and 
they can be abstracted into structures known as dependency models and graphoids, 
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which were introduced by Gieger, Verma, and Pearl [VP90a[ rGVPQOa] . Here, 
the theory is briefly reviewed and quantum conditional independence is shown 
to also give rise to a graphoid. 

A dependency model M over a finite set is a tripartite relation over disjoint 
subsets of V. The statement that (U,W,X) G M will be denoted I{U,W\X), 
with a possible subscript on the I to denote the type of dependency model. 
I{U,W\X) should be taken to mean that "J7 and W only interact via X", or 
that "[/ and W are independent given X" . 

Example 4.2. An Undirected Graph Dependency Model Ig is defined in terms 
of an undirected graph G. Let V be the set of vertices of G and then let 
Ig{U, W\X) if every path from a vertex in U to a vertex in W passes through a 
vertex in X . Ig is often called the Global Markov Property. 

Example 4.3. A Probabilistic Dependency Model Ip is defined in terms of a 
probability distribution P{V) over a set V of random variables. Ip{U, W\X) is 
true if U and W are conditionally independent given X. 

Example 4.4. A Quantum Dependency Model Ip is defined in terms of a 
density operator pv acting on the tensor product of Hilbert spaces labeled by 
elements of a set V . Ip{U, W\X) is true if U and W are quantum conditionally 
independent given X . 

A graphoid is a dependency model that for all disjoint [/, W,X,Y C ^satisfies 
the following axioms: 

Symmetry: I{U, W\X) => I{W, U\X) 

Decomposition: I{U, W U Y\X) =^ I{U, W\X) 

Weak Union: I{U, W U Y\X) => I{U, W\X U Y) 

Contraction: I{U, W\X) and I{U, Y\X UW) ^ I{U, W U Y\X) 

A positive graphoid is a graphoid that also satisfies the additional axiom 

Intersection: I{U,W\X U Y) and I{U,Y\W U X) I{U,W UY\X). (89) 

Theorem 4.5. The quantum dependency model is a graphoid. 

Proof. Symmetry is immediate because S{U : W\X) is invariant under exchange 
of U and W. Decomposition and Weak Union follow from the strong subaddi- 
tivity inequality. Specifically, for A, B,G C V, strong subadditivity asserts that 
S{A : B\G) > 0, or in terms of von Neumann entropies 

S{A U C) + S{B U C) - 5(C) -S{AUBUG)>0. (90) 

Decomposition asserts that if S{U : W U Y\X) = then S{U : W\X) ^ 0. This 
is true if S{U : WUY\X)-S{U : W\X) > 0, since S{U : W\X) is guaranteed to 
be positive by strong subadditivity. Expanding S{U : W U Y\X) - S{U : W\X) 
and canceling terms gives 

S{U ■.WUY\X)-SiU ■.W\X)=S{UUWUX) + S{WUXUY) 

- S{WUX)- S{UUWUXUY), ^ ' 
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(85) 
(86) 
(87) 

(88) 



but the right hand side is positive by eq. ^ with A ^ U, B = Y,C ^ W U X . 

Weak Union is proved via a similar argument applied to S{U : W U Y\X) — 
S{U : W\X U Y). It follows from eq. ^ by taking A ^ U,B ^Y,C ^ X. 
Finally, contraction follows from noting that S{U : W\X) + S{U : Y\X iJW) = 
S{U : W U ^1^), which is straightforward to show by expanding in terms of 
von Neumann entropies. □ 

The well-known analogous result for classical probability distributions fol- 
lows immediately because classical probability distributions can be represented 
by density matrices that are diagonal in an orthonormal product basis, and for 
such states the von Neumann entropies of subsystems are equal to the Shannon 
entropies of the corresponding marginal distributions. Additionally, if P{V) is 
positive for all possible valuations of the variables then the associated depen- 
dency model is actually a positive graphoid. The analogous quantum property 
would be to require that pv is a strictly positive operator, i.e. it is of full rank, 
but we have not been able to prove that this property implies intersection. 

The undirected graph dependency model is also a positive graphoid. The 
proof is straightforward, so it is not given here. The following theorem is im- 
portant for the theory of Markov networks. 

Theorem 4.6 (Lauritzen [Lau06aj ) . The undirected graph dependency model is 
equivalent to the dependency model obtained by setting I {U, V — {U U n{U))\n(U)) 
for all U , where n(U) is the set of nearest neighbors of U , and demanding 
closure under the positive graphoid axioms. 

The condition / {U, V ~ [U \J n{U))\n{U)) defines the Local Markov Property 
on a graph. Note that although its closure under the positive graphoid axioms 
is equivalent to the Global Markov Property, this is not the case for a graphoid 
that doesn't satisfy intersection [Lau06a| . 

4.4 Quantum Markov Networks 

Using the terminology of the previous section, the definition of a classical 
Markov Network can be conveniently reformulated as a pair (G, P(V^)), where 
G = (y, E) is an undirected graph and PiV) is a probability distribution over 
random variables represented by the vertices, such that the graphoid Ip satis- 
fies the local Markov property with respect to the graph G. The definition of 
a quantum Markov network can now be obtained by replacing the probabilistic 
dependency model with a quantum dependency model. 

Let G = (y, E) be an undirected graph and suppose that each vertex v & V 
is associated with a quantum system, also denoted f, with Hilbert space TL^. 
Let Pv be a state on Jiv — ®v^v Pv) is a- Quantum Markov Network if 

the graphoid Ip satisfies the local Markov property with respect to the graph G. 
Further, if pv is of full rank, then (G, pv) is called a Positive Quantum Markov 
Network. Note that unlike in the classical case, we cannot conclude that the 
global Markov property holds for positive quantum Markov networks because 
the intersection axiom has not been proved. 
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The remainder of this section provides some partial characterization results 
for quantum Markov networks, along the lines of the Hammersley-Clifford theo- 
rem. The most generally applicable of these results makes use of the product. 

Theorem 4.7. Let G = {V^E) he an undirected graph and let £ be the set of 

cliques of G. If {G,pv) is a positive quantum Markov network then there exist 
positive operators ac acting on the cliques of G, i.e. C S such that 

Pv^O<Jc- (92) 

This theorem is analogous to one direction of the Hammersley-Clifford the- 
orem and the proof is very similar to a standard proof for the classical case 
|Pol04a] ■ but is somewhat involved so it is given in appendix |B] However, un- 
like the classical case, the converse does not hold, i.e. there are states of the 
form eq. ((92)) that do not satisfy the local Markov property as illustrated by 
the following example. 

Example 4.8. Consider a chain of 3 qubits A, B , and C coupled through an 
anti-ferromagnetic Heisenberg interaction H — a^a^Ic + '^a'^b^c + '^a'^b^c + 
IaO'b^C ~^ -^a'^b'^c ~^ ^a'^b'^c "where a'^ , , and denote the Pauli operators 

and cr^ crV^. (93) 

The Gibbs state PAuBuciP) = z{p) 6xp(— /3iJ) has the form eq. (|92|) . but for 
any finite P it has a non-zero mutual information between A and C conditioned 
on B as shown on Fig. [H 

For trees, a decomposition into reduced and mutual density operators anal- 
ogous to eq. ([7^ is possible. For this, wc need the following lemma. 

Lemma 4.9. Let G — [V, E) be a graph, let {G,pv) be a quantum Markov 
network and let u Cz V . Let G' — {V ,E') he the graph obtained by removing u 
from V and removing all edges that connect u to any other vertex from the graph. 
Let G" = {y\ E") he the graph obtained by adding to G' an edge between every 
pair of distinct neighbors of u in the original graph G. Let pv — Tru{pv')- 
Then {G" ,pv') is a quantum Markov network. 

Proof. For U <Z V , \et Uu = U — u \i u € U and Uu = U otherwise, and denote 
nciUy) and nc"{Uu) the neighbors of Uu in the graphs G and G" respectively. 
It must be shown that Ip^^iU, V - {U (J nG{U)\nG{U)) for all U <Z V implies 
Ipy,{Uu,V' - {Uu U no" {Uu)\nG" {Uu)) for every J7„ C V . By symmetry, we 
can assume without loss of generality that u G U. There are two different cases 
to consider: 

Case I: nG{u) nU ^9. 

This implies that nG"{Uu) = nG{U) and so V - {Uu U nG"(C/„)) = - (C/ U 
nG{U)). We conclude that Ip^,{Uu^V' — (Uu UnG"'(t^«)l"-G"(C^«)) is equivalent 
to {U — u,V — {UUnG{U)\nG{U)), and the result follows from decomposition. 
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Figure 2: Conditional mutual information for a 3- vertex anti- ferromagnetic 
Heisenberg spin-i chain as a function of inverse temperature /3. 

Case II: nc{u) (lU = 9. 

This implies that nG"{Uu) = nciUu)- Consider the local Markov property on 
the original graph G apphed to Uu- Ipv{Uu,V — {UuiinG{Uu)\nG{Uii)) which is 
equivalent to Ip^^ {Uu,u UV — {Uu U nc" {Uii)\nG" {Uu)) , and the result follows 
from decomposition. □ 

Theorem 4.10. Let G = (V^E) be a tree. If {G,pv) is a positive quantum 
Markov network then it can he written as 



py= TT ■ (94) 




Proof. The proof is by induction on the number of vertices in the tree. It is 
clearly true for a single vertex, so consider a tree G = (V, E) with N vertices and 
choose a leaf vertex u ^V. Construct the quantum Markov network {G",pv') 
as in lemma l479l Since w is a leaf it only has one neighbor in G, denoted w, so 
the only difference between G and G" is that u and the single edge connecting 
u to the rest of the graph have been removed. By the inductive assumption, 
pv' has a decomposition of the form 




= (X) P. . (95) 



Generally, pv — Pv'u{u} — Pv pI^v'- The local Markov property implies 
that Ip{u,V' — uijui), so that Pu\v' — Pu|uij which in turn can be written as 
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Pil\w — Pu ^ Pu:wi SO 

PV = PV [pu Pu:n) ■ (96) 

Every term in cq. (|95p commutes with p„ , because they are defined on different 
tensor product factors. Also, pu-.w commutes with all the other mutual density 
operators either because they act on different tensor product factors or because 
the fact that w is the only neighbor of u implies that u is quantum conditionally 
independent of any other subsystem given w. □ 

In the classical case, the Hammersley-Clifford decomposition is not necessar- 
ily unique, and when the graph is a tree the decomposition into marginal and 
mutual distributions is only one possibility. Similarly, a state pv might have a 
decomposition of the form of eq. (|94|) but with more general operators in place 
of the mutual and marginal states. This provides another motivation for the 
definition of an n-bifactor state that was given in eq. (I83p . As mentioned in 
not all n-bifactor states are quantum Markov networks, but a subset of 
them are, as shown by the following theorem. 

Theorem 4.11. Let G = {V, E) be a tree with each vertex v gV associated to 
a quantum system with Hilbert space Hy Let Tiv = ^vev '^'^'^ '^'^ 
n-bifactor state on Tiy- If pv is decomposable with respect to all pairs Vu-.v and 
Vw:v, then {G,pv) is a quantum Markov network. 

The notion of decomposability used in the statement of this theorem is de- 
fined at eq. ([77)1 . The proof is straightforward and we leave it as an exercise. 

4.5 Other Graphical Models 

In this section quantum generalizations of two other Graphical Models are de- 
scribed: Factor Graphs and Bayesian Networks. Generally, the choice of which 
model to use depends on the application and Belief Propagation algorithms have 
been developed for all of them in the classical case. For example. Factor Graphs 
arise naturally in the theory of error correcting codes, Bayesian Networks are 
commonly used to model causal reasoning in artificial intelligence, and Markov 
Networks are useful in statistical physics. However, it is now understood that 
the classical versions of these three models are interconvertable, and that upon 
such conversion the different Belief Propagation algorithms are all equivalent 
in complexity |AM00a| IYFW02al rKFLOla] , Some similar results also hold for 
the quantum case, as we illustrate by showing how a quantum factor graph 
can be converted into a 1-Bifactor Network. This construction is used in the 
application to quantum error correction described in i )7.1l 

4.5.1 Quantum Factor Graphs 

A quantum factor graph consists of a pair (G, py), where G = {U,E) is a 
bipartite graph and pv is a quantum state. A bipartite graph is an undirected 
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Figure 3: Factor graph representation of the state (|000) + |lll))ut,io, with 
= Ml- = Mto = ^ and Xa ^ {I + (g) a^), Xb = {I + cr^ (E> ® a^), and 



graph for which the set of vertices can be partitioned into two disjoint sets, 
V and F, such that {v, f) ^ E only li v ^ V and f ^ F. The vertices in V 
are referred to as "variable nodes" and those in F as "function nodes" . Each 
variable node v is associated with a quantum system, also labeled v, with a 
Hilbert space Tiv, and pv is a state on (^^gy Hv The Hilbert space associated 
to a function node / is the tensor product of the Hilbert spaces of the adjacent 
variable nodefH: Hf = 0^,£„(y) Hv The state associated with a factor graph is 
of the form 

= ^ n ^/ * (97) 

feF vev 

where /i^ is an operator on Tiy, Xf is an operator onTif and = 0. 

For example, such a state would be obtained after performing a sequence 
of projective von Neumann measurements on a product state of the variable 
nodes (see Fig. [3]). More precisely, for each f € F, let {P}} be a complete 
set of orthogonal projectors, and let Pv be the initial state of V. When 

the projective measurements {Pj} are performed at each function node and 
commuting outcomes Pj = Xf are obtained, the post-measurement state is of 
the form of eq. ([W)) . Similarly, factor graph states could be obtained from 
more general POVM measurements {£■]■}, provided the state update rule pv ^ 

^ fji^^y^ — is used. In that case, the Xf could be any positive operator 

rather than being restricted to projectors as in the case of a von Neumann 
measurement. 

To convert a factor graph into a 1-Bifactor Network, we need to treat the 
function nodes as distinct quantum systems, and so endow them with their own 
Hilbert spaces = {S)i,g„(/) where 7i^/ is isomorphic to Hv- The system 
RI is called a reference system for v in f. Then, the state of the function nodes 
can be written on the graph G = {U, E), where U = V U F, pv = Trp (pu) and 

P(7 = ^0Ai«* n '^"^Z' (^^^) 



■^The following equality is not just meant in the sense of an isomorphism, they are the same 
Hilbert spaces. 
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Figure 4: This directed acyclic graph has two distinct ancestral orderings: 
(a, b, c, d) and (a, c, 5, d). The equalities S{d : a\b Uc) — and S{b : d U d\a) = 
are examples of constraints that are satisfied when (G,pv) is a Quantum 
Bayesian Network. 

where for u G F, fi^ = Xj, Py.j = dy\^){^\_^^jj^f and 1*)^^^/ = 

Sj=i b)u denotes the maximally entangled state between v and its 

reference 

4.5.2 Quantum Bayesian Networks 

Apart from Markov Networks, there are other Graphical Models that make use 
of the theory of dependency models and graphoids. Bayesian Networks provide 
an example, and they are commonly applied in expert systems to model causal 
reasoning [Nea90a[ INea04aj . The basic idea is to replace the undirected graph of 
a Markov network with a Directed Acyclic Graph (DAG) , wherein the directed 
edges represent direct cause-effect relationships. The quantum graphoid can be 
used to give a straightforward generalization of the classical networks, which 
we only treat briefly here. To describe the generalization, a few definitions and 
facts about DAGs are required. 

For a vertex w in a DAG G — {V,E), let m{v) denote the parents of v, i.e. 
m{v) — {u € V\{u,v) e E}. The set of ancestors of v is denoted a{v) and 
consists of those vertices u for which there exists a path in the graph starting at 
u and ending at v. Conversely, the set of descendants of v is denoted d{v) and 
consists of those vertices u for which there exists a path in the graph starting 
at V and ending at u. The set of parents of a subset C/ C 1/ of vertices is 
defined as m(U) = U„g[/m(u) — U and similarly a{U) = U„g[/a(w) — U and 
d{U) = Uu£ud{u) — U. The set of nondescendants of a subset U <Z V of vertices 
is defined to be nd{U) = V — {d{U) U U). Note that the vertices in U are not 
considered to be nondescendants of U for technical convenience. Finally, every 
DAG has at least one ancestral ordering of its vertices {vi,V2, ■ ■ ■ ,Vn), such that 
if Vj G a{vk) then j < k (see Fig. |4]). 

A Quantum Bayesian Network is a pair (G, py), where G — (V, E) is a DAG, 
each vertex v G V is associated with a quantum system, also denoted v, with 
Hilbert space Hv, and pv is a quantum state on Hv — ^v^v • state pv 
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satisfies tlie conditional independence constraints Ip{U, nd{U) —m{U)\m{U)) for 
all subsets C/ C 

The definition of a classical Bayesian Network is obtained by replacing 
the quantum systems with classical random variables. It can be shown that 
{G,P{V)) is a classical Bayesian Network iff P{V) = JlDev ^^'^ ^ 

partial quantum generalization of this can be obtained using the conditional 
density operator. 

Due to the nonassociativity of the products, expressions like A 
B ★'^"^ C are ambiguous. It is convenient to adopt the convention that they are 
evaluated left-to-right, so that A B C = B) C. Similarly, 

we adopt the convention that 

^ ^ Aj = [[[Ai A2) Aa) . . .) An. (99) 

Theorem 4.12. //(G, pv) is a Quantum Bayesian Network and {vi,V2, • . . , f at) 
is an ancestral ordering of V then 

Proof. For any ordering (ui, U2, . . . , wjv) of the vertices, an arbitrary state can 
always be written as 

Py=(^i"')Y . (101) 

This is a quantum generalization of the chain rule for conditional probabilities, 
which follows straightforwardly from the definition of conditional density opera- 
tors. If (wi, V2, . . . , vn) is in fact an ancestral ordering, then {vj-i, Vj-2, . . . ,vi} C 
nd{vj), so Ip{vj,nd{vj)\m{vj)) implies that = ^i"|m(«,)- ° 



5 Quantum Belief Propagation 

In this section, we discuss algorithms for solving the inference problem that we 
started with in Sj^lfor the case of n-Bifactor Networks. In fact, we start with the 
seemingly simpler problem of computing the reduced density operators of the 
state on the vertices and on pairs of vertices connected by an edge, and then 
present a simple modification of the algorithm to solve the inference problem 
for local measurements. 

Recall that n-bifactor states are of the form 




(102) 



and that the operators associated with vertices and edges do not have to be 
straightforwardly related to the reduced and mutual density operators. There- 
fore, it is not clear a priori that even the simpler task can be done efficiently. 
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Quantum Belief Propagation (QBP) algorithms are designed to solve this prob- 
lem by exploiting the special structure of n-bifactor states. Since the class of 
states under consideration is different for each value of n, there is not one but a 
family of algorithms. The algorithm that is designed to solve inference problems 
on n-Bifactor Networks is denoted QBP^"). 

To avoid cumbersome notation, focus will be given to n-bifactor states with 
n < oo. Recall that the operators Vu-.v defining these states mutually commute. 
This is not true of c»-bifactor states. Nevertheless, a Belief Propagation al- 
gorithm for oo-bifactor states can be readily defined from the finite n one, by 
replacing all products appearing in eqs. (|103m05p by the product. Under this 
modification, the convergence Theorem lS. 61 applies to cxD-Bifactor Networks, and 
its proof only requires straightforward modifications. 

The remainder of this section is structured as follows. ^S.ll gives a description 
of the QBP algorithms and §5.2l shows that QBP^") converges on trees if the n- 
Bifactor Network is also a quantum Markov Network and that QBP*^^) converges 
on trees in general. In both cases, the algorithm converges in a time that scales 
linearly with the diameter of the tree. Finally, ij5. 31 explains how to modify the 
algorithm to solve inference problems for local measurements. 

5.1 Description of the Algorithm 

To describe the operation of the QBP algorithms, it is helpful to imagine that 
the graph G represents a network of computers with a processor situated at each 
vertex. The algorithm could equally well be implemented on a single processor, 
in which case the network is just a convenient fiction. Pairs of processors are 
connected by a communication channel if there is an edge between the corre- 
sponding vertices. The processor at vertex u has a memory that stores the value 
of /lu as well as the value of Vu-.v for each vertex v that is adjacent to u in the 
graph. The task assigned to each processor is to compute the local reduced 
state pu and the joint states PnutH- At each time step t, the processor at u up- 
dates its "beliefs" about p„ and puuv via an iterative formula. These beliefs are 
denoted b^\t) and 61"' (i), and are supposed to be approximations to the true 
reduced states pv and puw based on the information available to the processor 
at time step t. Since the reduced states may depend on information stored at 
other vertices, the processors pass operator valued messages m^u^Xv (t) along the 
edges at each time step in order to help their neighbors. The message m^v{t) 
is an operator on Ti.y and is initialized to the identity operator mi"l^(0) — ly 
at i = 0. For i > it is computed via the iterative formula 




(103) 



Here, Y is an arbitrary normalization factor that should be chosen to prevent the 
the matrix elements of m^^vit) becoming increasingly small as the algorithm 

•^Of course, it would be sufficient to only have one processor compute puUv for each edge. 
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proceeds. It is convenient to choose Y such that Tr„ ^ml"li,(t)^ — 1. 

The beliefs about the local density operator at time t are given by the 
simple formula 

bi"Ht)^lj^,^^(-^ n ^I'luit)^ (104) 

where Y' is again a normalization factor that should be chosen to make Tr„ ^^foi"'' (t) 

1. On the other hand, the beliefs about puuv also depend on the messages re- 
ceived by the processor at v, so we have to imagine that each vertex shares 
its messages with its neighbors. Having done so, the beliefs about puUv are 
computed via 



■ n ^'^^uit) n ^':Lxt)} 

(105) 

where Y" is again a normalization factor. 

The beliefs obtained from the QEP*^"^ algorithm on input {pu}uev and 
{i^u:v}{u,v}£E after t time steps are denoted [6l"^(t), 6™ (i)] = QBP^"''(/x„, Vu-.v)- 
The goal of the next section is to provide conditions under which the beliefs 
represent the exact solution to the inference problem, i.e. to find states and 
values of t such that QBP|"^(^„, i^u-.v) = [Pu, Puuv]- 

5.2 Convergence on Trees 

At time t, the beliefs l)"u\t) and b^uv {t) represent estimates of the reduced states 
Pu and Puvjv of the input n-bifactor state pv- Note that when the /x„ and the 
Vu:v all commute with one another and are diagonal in local basis, the QBP'"' 
algorithms all coincide for different n (including n = oo) and correspond to 
the well known classical Belief Propagation algorithm. This algorithm always 
converges on trees in a time that scales like the diameter of the tree. Its conver- 
gence on general graphs is not fully understood and constitutes an active area 
of research [YedOlaT IYFW02a] . In the quantum setting, the /i„ and the v^-v do 
not commute in general, but for finite n, the v^-.v commute with each other by 
assumption. This has straightforward consequence that will be of use later. 

Proposition 5.1. For all u^v ^ V , x ^ n{u), and w G n{v), the following 
commutation relations hold [vu-.vjiTT-x^uit)] = and [m''wl^v{t),mli^u{t)] ~ . 

Before proving the convergence of Quantum Belief Propagation, the following 
classical example can help build intuition of its workings, and also serves to 
outline the crucial steps in proving convergence. 

Example 5.2. Consider the function P of N discrete variables Xj £ {1, 2, . . . , d} 

P{xi,X2, ■ ■ ■,Xn) = ■4;{xi,X2)ll'{x2,X3) . . . 1p{xN-l, Xn) (106) 
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Figure 5: Belief is a function of /i.y, i^u-.v! a-nd the incoming messages at 
vertices u and except m^^u and m^^^,. 

which could be for instance a classical bifactor distribution on a chain with N 
sites. To evaluate the marginal function P(xn) — X^xi X2 xn i -^(^i'^2j • ■ • t^n), 
one can proceed directly and carry the sum over d^ terms. A more efficient so- 
lution is obtained by invoking the distributive law to reorder the various sums 
and products into 

P{xn) ^ ^ (^iP{xn~1,Xn)(^. ■ . (^^■>P{x2,X3)(^^'lp{xi,X2)jY ■ ■) 

XN-1 X2 XI 

and performing the sums sequentially, starting with then and 

P(.Xn) = ^ (i^{xN~.l,XN)(^- ■ ■ (^^i^{x2,X3)Mi^2ix2) 

Xn-1 X2 

= ('iP{xn-i,xn)(^. ■ .Ah^sjxs) ■ 



= ^ ij{xN~i : xn)Mn-2^n-i{xn^i) 

Xn-1 

where the "messages" are defined recursively Mj^j+i(xj+i) = '^j.. '4'ixj ■ Xj+i)Mj^i^j{xj 
with Mi^2 = '^xi "^{xi ■ X2). Each of these steps involves the sum of terms, 
so P{xn) can be computed with order Nd^ operations. 

This example differs from the Belief Propagation algorithm described in the 
previous section in three important aspects. Firstly, it relied on the distributive 
law, which does not hold in general for the product, i.e. Tr„ {Xuv i^t)i«) 7^ 
Trtj {Xuv) Yvw in general. This will motivate Theorems 15.41 and 15. 5[ that 
establish necessary conditions for the validity of the distributive law. Secondly, 
the graph in that example is a chain, whereas Belief Propagation operates on 
any graph. However, Belief Propagation is only guaranteed to converge on trees, 
and the above example generalizes straightforwardly to such graphs. Thirdly, 
the messages in the example must be computed in a prescribed order: Mi-i^i 
is required to compute M^^i+i. This last point is important and deserves an 
extensive explanation. 
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Figure 6: For (u, v) E E, the graph is obtained from G by considering u 
as the root and removing the subtree associated to vertex v. In this example, 
depthiC^) = 2. 

Suppose that instead of computing the messages Mi^i^i sequentiaUy, mes- 
sages at each vertex were computed at every time step, following the rule 
m^^i±i{t,Xi±i) = J2x,'^^Ti^d't - l,Xi)il:{xi : Xi±i), as in eq. (|103p . with 
the initialization mi±i—,i(0, Xi) = 1. Then, one can easily verify that for t > i, 
mi-yi+i{t, Xi+i) — Mi^i+i{xi+i). In other words, the messages rrii^i+i be- 
come time independent after a time equal to the distance between vertex i the 
beginning of the chain. This observation can in fact be generalized as follows. 

Lemma 5.3. When G is a tree, the QBP^^^^ messages m^a^y(t) are time inde- 
pendent for t > depth{G"), where G" is the tree obtained from G by choosing u 
as the root, and removing the subtree associated to v (see Fig. 

Proof. The proof is by induction. If m is a leaf, it has a unique neighbor n{u) and 
'^i'in(ii) (^) ~ (/^" ^u:n(ii)) which is time independent. If u is not a leaf, 
it has two neighbors L(u) and R{u). Clearly, if ^2^"^^)_^„(^) is time independent 

for t > r, then m|;_^^(„)(t) = Tr„ (/m " 1) ^u:i?(«)] ) is 

time independent for t > + 1. □ 

When operated on a tree, all beliefs computed by QBP algorithm converge 
to a steady state after a time equal to the diameter of the tree. Note that when 
the graph contains loop, the beliefs do not necessarily reach a steady state. It 
remains to be shown that on trees, this steady state is the correct solution. 
For this, we need a technical result that requires some new notation. Let U 
and W be two non-intersecting subsets of V. Define the two subsets of edges 
Eu = {{u,w) E E : u,w e U} and Eu-.w — {iu,w) £ E : u € U and w G W}. 
Let Tu = (g)„g[; ^J,u and for any F C E, let = n(«,™)eF ^■"■w- 

Theorem 5.4. Let {G,pv) be an n-Bifactor Network with graph G = {V^E). 
Let U, W, X be non-intersecting subsets of V such that U UW U X — V . When 
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S(U : X\W) = 0, the following diagram is commutative. 



Euuw-.x > 

rx*<"'(-AE^) 



py = Tv Ae^ 



Tru 



Tru [pv] 



(107) 



Proof. The down-right path is the simplest. The first equahty foUows from the 
fact that A^;^ commutes with Tuuw and all other A^'s, and the definition 
Pv = {^uuw ^ Tx) (Ae 

UUW 

Ae ULiw-.x^Ex)- The second equality is just a 
definition. The right-down path uses the representation of states that saturate 
strong subadditivity eq. (|27p . which implies that pv has a decomposition of the 
form Pv = J2j=iPj'^(jw'!^^ ^ ^vk'^'jc' First observe that 



Tuuw (Ae^^^Ae^^^..^) = (r^^ pv)AeI 



E 



(108) 

>x)^eI (109) 



(110) 



It follows that 

d 

= Tr,7 (py) . 



□ 



Specializing to the case n — 1 enables a stronger result to be derived that 
does not require independence assumptions. 

Theorem 5.5. Let {G,pv) be a 1-Bifactor Network with graph G ~ [V^E). 
Let U, W, X be non-intersecting subsets of V such that U U W U X ~ V . The 
following diagram is commutative. 



Tu * {AeuuwAeuuw-.x) - 

^^WUX*i- ) 

pv =^V * Aev 



Tru 



Tru 



^ Tru{Tu*iAE Aeuuw-.x)) 

\^W\JX*{-^Ex ) 

^ Tru {pv) 



(111) 
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Proof. The theorem fohows simply from the cychc property of the partial trace: 

Tvu (pv) = Tru ([rj ® r|,^^]A£[rJ ® r|,^^]) (112) 

= rJ/uxTrc/ (r[/AB„,„AE„,„,^^) Ab^ ® rj.^^. (114) 

□ 

We are now positioned to state and prove the main result of this section. 

Theorem 5.6. Let {G,pv) be an n-Bifactor Network with graph G = {V^E), 
and let (t), (i)] - QBPi"V ). If{G,pv) is a quantum Markov 
network andG is a tree, then for allt > diameter{G), b^\t) = pu andbl^{t) = 

PuUv- 

Proof. First, observe that b^u\t) ~ Ti'v (^blf^{t)j, so it is sufficient to prove 

that bi",}{t) — Puvjv Consider m U w to be the root of the tree. We proceed 
by induction, repeatedly tracing out leaves from the bifactor state except u 
and V until we are left with only vertices u and v. Set G(0) = G and let 
G(t) — {V{t), E{t)) be the tree left after t such rounds of removing leaves. 
Denote the leaves of G{t) apart from u and v by l{t), the children of x by c{x), 
and the unique parent of x by m{x). At i = 0, consider tracing out a leaf w of 
G 

Tr^ (Pv) = Tr„ (^(/i„ ^ry.^)**^"^ {l^w:m{w)^Ev-^)j (115) 

Ab,_ (116) 

"^i"l™(.)(l)As.^„] (117) 

where we have used Theorem 15.41 going from the first to the second line. Since 
this holds for all leaves, we conclude that 



Tr,(o) (pv) - r^d) n n rn^y%mvii) ] ■ (118) 

\a:ei(0) yec{x) 

We thus make the inductive assumption that 

= ( n n <i.wAv(,) I . (119) 

\xei{t) yec{x) 



— V—w ^ 
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It follows that 

Pv{t+i) = Tr,(t) {pv{t)) 



Tr 



lit) L v{t) 



n n 



mi;Zxit)^v{t) 



(120) 
(121) 



n n ™|,"i.W'^r.:m(.)Ay(t+i)) 



i:ei(t) 



i V(t+1) *^ ^ 



n Trj^^^Hf n 



A 



a;e/(t) 



= i V(t+1) * 



£ce/(t) 



n n ^t^x{t+i)kvit+i) 

xei{t+i) yec{x) 



(122) 

(123) 
(124) 

(125) 



also assumes the same form, so eq. (|119l) follows by induction. We have again 
used Theorem l5.4l in going from the third to the fourth line. When V{t) contains 
only u and v then this reduces to Puuv — buv{t), which is what we set out to 
prove. □ 

Once again, specializing to the case n — 1 enables a stronger result to be 
derived that does not rely on independence assumptions. 

Corollary 5.7. Let {G,pv) be an 1-Bifactor Network with graph G = {V,E), 
and let [buit),buvit)] — QBPj^-* (/i„, i/„.i,). If G is a tree, then for all t > 
diameter (G), bu{t) = Pu and buv{t) = Puuv 



Proof. This Corollary is a consequence of Theorem 15.51 and the fact that the 
proof of Theorem 15 .61 onlv relies on the commutativity of the diagram eq. (|107p . 

□ 

This last result gives us additional information about the structure of corre- 
lations in 1-bifactor states that is captured by the following corollary. 

Corollary 5.8. Let {G,pv) be an 1-Bifactor Network on graph G — {V,E). If 
G is a tree, then the mutual density operators commute: [pu-.v, Pw.x] = for all 
{u,v) and {w,x) G E. 
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Proof. The only non trivial case is [pu-.v, Pv.w] with w. Let [&M(i), = 
QEpI^-* and denote 

Au-v{t)= W m^^„(i). (126) 

Observe that A^-y (t) is an operator on Tiu , and by Proposition l5.1[ [A^^-^, (t) , Uu:w 
for all and w G From Theorem 15.61 we have for t > diameter{G) 

□ 

Corollary 15.71 shows that for 1-bifactor states on trees, QBP^^^ enables an 
efficient evaluation of the one-vertex and two-vertex reduced density operators 
Pu for all u G y and puuv for all {u, v) S E. Can this result be generalized to 
arbitrary bifactor states? This question is of interest since, as we will detail in 
H7.2[ the Gibbs states used in statistical physics are oo-bifactor states. However, 
it is known that approximating the ground state energy of a two-local Hamilto- 
nian on a chain is QMA-complete [AGKOTal llraOTap . Knowledge of Puuv leads 



to an efficient evaluation of the energy. Therefore, without any independence 
assumptions, it is unlikely that an efficient QBP algorithm for n-Bifactor Net- 
works will converge to the correct marginals for n > 1. This contrasts with 
classical BP that always converges to the exact solution on trees. However, i )6.3l 
gives a QBP algorithm that solves the inference problem for any n-bifactor state 
on a tree in a time that scales exponentially with n. 



5.3 Solving Inference Problems 

We close this section with a discussion of how QBP algorithm can solve inference 
problems when local measurements are executed on a bifactor state. In other 
words, for an outcome of a local measurement on a subsystem U described by a 
POVM element E^j'^ = {g)^^^^ Ei^\ we are interested in evaluating the marginal 
states P^^^U) and P^u^ib'^' conditioned on the outcome, where 

Pu\Ei^^ = y^'-v-u ((4^'¥py(^^a¥) (128) 

P„u.|<' = y'^'v-M {{E^P)hv{E^j¥) , (129) 

and y is a normalization factor. For u,v ^ U, this amounts to a local modifica- 
tion of the bifactor state that accounts for the action of the measurement, the 
QBP algorithm being otherwise unaltered. We focus on 1-Bifactor Networks 
and return to the general case at the end of this section. 

^QMA stands for Quantum Merlin and Arthur and it is the natural quantum generalization 
of the classical complexity class NP. So to the best of our knowledge, solving a QMA-complete 
problem would require an exponential amount of time even on a quantum computer. 
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Theorem 5.9. Let {G,pv) be a 1-Bifactor Network with G = {V,E) a tree. 
For U CV, let = | 0„g[;£^u '| be a POVM on the subsystem U and 



letW ~U. Define ^ 



* Eu'' for u e U and jjLu^ 



Hu for u <^W . 



I i^u\v)- Then for all t > diameter [G) , bu{t) 
Pu\Elj> f'^'^ allueW and b^uvit) = Puv\Elj> ^ ■ 



Proof. The reduced state on W conditioned on the measurement outcome E'^j 
is given by 



U) 



v^W (w,x)eE 



vew (w,x)eE 

= yl[ n {f^y'f^^u({i^i^^p.:x{,. 

vew {w,x)eE ^ 
The result thus follows from Corollary 1 5. 71 



U) 



(130) 
(131) 

(132) 

(133) 
□ 



The result of Theorem l5.9l can easily be extended to compute the conditional 
marginal state P^^^u) and P„ut)|£;<^' ^'^^ ^^'^ ^' ^'^^ J^®* those mW ^ V—U. 

This is achieved by altering the beliefs as follows 

1 



buit) 



:Ey^ 



n 



.u{t) 



for u & U, 



buv{t) = ^E^J-k{fiut^v)^ 



n ' 



^u{t) n 



w' ^n(v) — u 



(134) 



(135) 



with E^ = E^'^ 



ly when u e U and v <= W and Euy = Eu^ 



Ei^^ when 



u,v ^U. The proof is straightforward and we omit it. 

Theorem [53] shows how QBP leads to an efficient algorithm for solving infer- 
ence problems on 1-bifactor states on trees with local measurements. This imme- 
diately implies an efficient algorithm for general n-bifactor states when {G,pv) 
is a quantum Markov network. Indeed, Theorem 15.61 demonstrates that in that 
case the QBP(") algorithm can be used to efficiently compute the marginal den- 
sity operators Puuv for all {u, v) G E. From these, one can straightforwardly 
obtain the marginal operators pu for all m G and mutual operators pu-.v for 
all {u,v) G E. Theorem 14.101 states that pv can be represented as a 1-bifactor 
state in terms of its marginal and mutual operators. The inference problem can 
then be solved using the QEF*^^) algorithm as explained above. 
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6 Heuristic Methods 



The previous section provided conditions under which QBP algorithms give 
exact solutions to inference problems on n-Bifactor Networks. Namely, the un- 
derlying graph must be a tree, and the state must be either a quantum Markov 
network or a 1-bifactor state. When these conditions arc not met, QBP algo- 
rithms may still be used as heuristic methods to obtain approximate solutions 
to the inference problem, although in general these approximations will be un- 
controlled. 

To draw a parallel, classical Belief Propagation algorithms have found ap- 
plications in numerous distinct scientific fields where they are sometimes known 
under different name: Gallager decoding, Viterbi's algorithm, sum-product, and 
iterative turbo decoding in information theory; cavity method and the Bethe- 
Peierls approximation in statistical physics; junction-tree and Shafer-Shenoy 
algorithm in machine learning to name a few. In many of these examples, BP 
algorithms exhibit good performance on graphs with loops, even though the 
algorithm does not converge to the exact solution on such graphs. In fact, 
"Loopy Belief Propagation" is often the best known heuristic method to find 
approximate solutions to hard problems. Important examples include the near- 
Shannon capacity achieving turbo-codes and low density parity check codes. On 
the other hand, there are known examples for which loopy BP fail to converge 
and their general realm of applicability is not yet fully understood. 

As in the classical case, one can expect loopy QBP to give reasonable approx- 
imations in some circumstances, for instance when the size of typical loops is 
very large. Intuitively, one expects a local algorithm to be relatively insensitive 
to the large scale structure of the underlying graph. However, quantum infer- 
ence problems also pose a new challenge. Quite apart from issues regarding the 
graph's topology, an n-bifactor state with n > 1 may not obey the independence 
conditions required to ensure the convergence of QBP. The goal of this section 
is to suggest three techniques that are expected to improve the performance of 
QBP in such circumstances. 

6.1 Coarse-graining 

By definition, a quantum Markov network has the property that the correlations 
from one vertex to the rest of the graph are screened off by its neighbors. When 
this property fails, QBP will not in general produce the correct solution to an 
inference problem. Coarse graining is a simple way of modifying a graph in such 
a way that the state may be a closer to forming a quantum Markov network 
with respect to the new graph than it was with respect to the original graph. 

A coarse graining of a graph G = {V,E) is a graph G = {V,E), where V 
is a partition of V into disjoint subsets of and {U, W) G E ii there is an edge 
connecting a vertex in [/ to a vertex in W in G. The coarse-grainings that are of 
most interest are those that partition V into connected sets of vertices (see Fig. [7] 
for example). It is an elementary exercise to show that if (G, pv) is an n-Bifactor 
Network, then (G, py) is an n-Bifactor Network for any coarse graining G. The 
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intuition for why coarse graining might get us closer to a Markov network is 
that it effectively "thickens" the neighborhood of each vertex, which may then 
be more efficient at screening off correlations. This intuition is illustrated in 
Fig. [7] and is supported by the fact that Markov networks are fixed points of 
the coarse graining procedure, i.e. if G is a coarse-graining of G, then {G,py) 
is a quantum Markov network whenever (G, pv) is a Markov network. 



a) b) 

Figure 7: Example of a coarse-grained graph. Figure a) shows in light gray the 
neighborhood of the darkened vertex in the original graph. In b) the dashed 
ellipses represent coarse-grained vertices. The neighborhood of the darkened 
coarse-grained vertex is represented by the light gray set. 

Also note that every graph G can be turned into a tree by a suitable coarse 
graining. When the obtained Bifactor Network is a Markov Network or when 
n = 1, QBP is then guaranteed to converge to the exact solution. The Hilbert 
space dimension at the vertices of the coarse-grained graph is bounded by an 
exponential in the tree-width of G, so this technique is efhcient only for graph 
of 0(log(iV)) tree-width. 

6.2 Sliding window QBP 

Sliding window QBP is similar in spirit to coarse-graining but is mainly suitable 
for chains (although the idea is easily generalized to arbitrary trees of low de- 
gree). Consider an n-bifactor state pv on a one dimensional lattice G = {V,E) 
with V = {vi,V2, . . . , vpf} and E = {{vj,Vj+i)}j=i^,,,^N-i- When (G, pv) is not 
a quantum Markov Network, the diagram of eq. p07p will generally fail to be 
commutative. The commutativity of this diagram is essential for the success of 
QBP, as for instance it implies 



Tr 



(136)" 

Thus, the Hilbert space of vertex vi is traced out before operators on vertex V3 
are brought into the picture. This enables the algorithm to progress along the 
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lattice by evaluating a cumulative operator of constant dimension (i.e. the mes- 
sages), much in the spirit of the transfer matrix of statistical physics. Without 
the Markov property, this is generally not possible. 

However, when vertices separated by a distance ^ are conditionally indepen- 
dent given the vertices between them, sliding window QBP can be operated 
efficiently to produce the exact solution of the inference problem. This works 
by defining new message operators 




(138) 



which act on Hv^^^ ^ ® • ■ • "^uj+j • When 

S{vj : Vj+i\{vj+i,Vj+2, ■ ■ ■ ,Vj+£-i}) ^ (139) 
for all Vj 'E V, we have the equality 

so inference problems can be solved exactly with operators whose dimension 
grow exponentially with the £ rather than the lattice size N. In particular, 
this method can be applied to spin-systems that have a finite correlation length 
because then eq. p39p can be expected to hold approximately for some finite £. 

6.3 Replicas 

The method of replicas maps n-bifactor states to 1-bifactor states on which 
QBP(^) can be implemented without concerns for independence. This is achieved 
by replacing the systems v on each vertex of the graph G by n replicas, so that 
the Hilbert space associated to vertex v becomes Hf^. As a consequence, the 
algorithm suffers an overhead exponential in n. The name "replica" is borrowed 
from the analogous technique used in the study of classical quenched disordered 
systems. The validity of this technique is based on the following observation. 

Proposition 6.1. Let {7ij}j=i,...,Ti be isomorphic Hilbert spaces. Let r(") be 
the operator that cyclicly permutes these n systems. Let Ai he an arbitrary 
operator on Tii, and define Aj = (T^^^y^^ Ai{T^'^')^y^^ to be the corresponding 
operators on Tij. Then for any set of operators {A[^^} on Tii, the following 
equality holds 

A« a(^) . . . ^1") = Tr2,3,...,„ ([aW ®A'i^®...® 4")]r(")) . (141) 

We are now in a position to formalize the replica method. 

Theorem 6.2. Let {G,pv) be an n-Bifactor Network, with operators fi^ and 
Uu-.v Then, pv is locally isomorphic to a 1-bifactor state with Hilbert spaces com- 
prising n replicas of the original system Ti'^ = Ti-ui ® ® . ■ ■® 'Hun for all u g 
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V . The partial isomorphism at vertexu is given by Tru2,u3,...,u, 
More precisely, we claim that 



PV = Tr[u2,U3,...,u„}^ev 



( I * n ^ 

\«ev / \{v,w)eE J J 



where 



t/-(g)(ri"))^ 



(142) 

(143) 
(144) 
(145) 



are operators on Ti!^ 



Proof. First, note that commutes with [[lu] , so /iu = I /iu" ) \Tu \ 

1 \ iJSin 



Thus 



Tr 



{lt2,tl3,---,Mre}ueV 



VuGV / \(t;,iu)e£; / / 



(146) 



Tr 



{"2, "3, 



o..,(a'foT<...(.i)*")J n (4.)' 



U 



(147) 



Tr 



\uev 



rp(n) 



(148) 



Tr 



{"2,M3,--->"7i}ueV 



Mm I * I n 

\uev / \(t),ii;)e_E 



Ti") (149) 



/in I * I 



PV 



(150) 



where we used Proposition 16. II to obtain the last hne. □ 

Since the dimension of the Hilbert at each vertex grows exponentiaUy with 
n, the QBP(^) algorithm used to solve the corresponding inference problem 
suffers an exponential overhead. One can make a replica symmetry ansatz, 
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assuming that the state is symmetric under exchange of rephca systems at any 
given vertex. Since the symmetric subspace of Tif " grows polynomiallj{f| with 
n, QBP algorithm can be executed efhciently. The vaUdity of this ansatz cannot 
be verified in general, but it may serve as a good heuristic method. 

7 Applications 

This section explains in some detail how QBP can be used as a heuristic algo- 
rithm to find approximate solutions to important problems in quantum error 
correction and the simulation of many-body quantum systems. The focus will 
be on the reduction of well established problems to inference problems on n- 
Bifactor Networks. One can make use of the techniques discussed in the previous 
section whenever the resulting Graphical Model does not meet the requirements 
to ensure convergence of QBP, or when these conditions cannot be verified effi- 
ciently. 

7.1 Quantum Error Correction 

Maximum-likelihood decoding is an important task in quantum error correction 
(QEC). As in classical error correction, this problem reduces to the evaluation 
of marginals on a factor graph, also called Tanner graph in this context. More 
precisely, for independent error models, the quantum channel conditioned on 
error syndrome is a 1-bifactor state. As a consequence, qubit-wise maximum 
likelihood decoding of a QEC stabilizer code reduces to an inference problem on 
a 1-Bifactor Network. Thus, there is no independence condition that needs to be 
verified, although the graph will generally contain loops. Before demonstrating 
this reduction, a brief summary of stabilizer QEC is in order, see jGot97aj for 
more details. For details on the use of Belief Propagation for the decoding of 
classical error correction codes, the reader is referred to the text of MacKay 
|Mac03aj and forthcoming book of Richardson and Urbanke [RUOSaj . 

Consider a collection of N two-dimensional quantum systems (qubits) V = 
{m}u=i,...,7v with Hu = C^. A QEC code is a subspace C G Hv that is the 
-M eigensubspace of a collection of commuting operators Sj, j — 1, . . . N — K, 
called stabilizer generators. Each stabilizer generator is a tensor product of 
Pauli operators on a subset Uj of V: 

Sj = (g) a? (151) 

where a" G {x,y,z}. When the stabilizer generators are multiplicatively in- 
dependent, the code encodes K qubits, i.e. C has dimension 2^. For each 
j = 1, . . . N — K , define the two projectors — {I ± Sj)/2. The code space is 
therefore defined as C = (Hj Pj^)'Hv- 

^More precisely, it grows as ("^^~^) ~ n'^~^ . 
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Error correction consists of three steps. First, the system V is prepared in 
a code state pv supported on C, in such a way that pyPj' = Pv for all 
j. The state is then subjected to the channel pv — > £v\v{Pv)- Second, each 
stabilizer generator Sj is measured, yielding an outcome Sj = ± with probability 
Ti{P^£v\v{Pv))- The collect ion of all N — K measurement outcomes Sj , called 
the error syndrome, is denoted s = (si, S2, . . . Sjv-x) G { — ,+}^~^- Third, 
the channel £v\v is updated conditioned the error syndrome s. Based on this 
updated channel, the optimal recovery is computed and implemented. 

The computationally difficult step in the above protocol consists in condi- 
tioning the channel on the error syndrome. To understand this problem, it is 
useful to express the channel in a Kraus form £y^y{pv) = -^^vivP^-^viv 
where {M^*^^} are operators on Hy- When Sj = +, we learn that the error 
that has affected the state commutes with Sj, while sj = — indicates that the 
error anti-commutes with Sj. To update the channel conditioned on the er- 

(k) 

ror syndrome Sj = + say, we first decompose each Kraus operator My^'y as 
the sum of an operator that commutes with Sj and an operator that does not 
commute with Sj: M^^y = M^^+ + M^^y where M^f+ = PfM^^yPf and 

My^y = My^y " My^^. Thc updatcd channel is obtained by throwing away 

the primed component My^y of each Kraus operator, and renormalizing. 

In what follows, we demonstrate how the conditional channel can be ex- 
pressed as a factor graph. This is most easily done using the Jamiolkowski 
representation of quantum channels. For each quantum system v, let Rv denote 
a reference for v, with Hilbert space Hr^ Hv Define the maximally entan- 
gled state between system v and its reference by — '^J2j \j)v b)K„- 
Then, the Jamiolkowski representation of a channel £v\v is a density operator 
Py onHy = Hv Urv given by = {Sy\y ® Iii^\R^){\^){^y r^), where 
I denotes the identity channel. For independent error models considered here, 

Pv = (2)«ey Pu- 

a*? a" 

For each stabilizer generator Sj, denote Sj = 0„£(7. cfu^ ^ cfr , and con- 

_± _ ' 

struct the associated projectors Pj = (/ ± Sj)/2. An important property 

of these operator is that they fix the maximally entangled state Sj \^)yR^ = 
P^ \^)yR^^ = \^)yR^^- Let E be an operator on V. If E commutes with Sj, we 
have P+ [E ^ Ir, ) |<1>)^^^ = {E (E> Ir, ) |$) and Pj {E ® Ir, ) \^)y^^ = 0, 

while if E anti-commutes with Sj, the same identities hold with P^ and Pj 

exchanged. It follows from this observation that conditioned on the error syn- 
drome s, the channel is described by the Jamiolkowski matrix 



Pv\s z 

j vev 

that is a quantum factor graph. 

There are a number of relevant quantities that can be evaluated from this 
factor graph. For instance, one can efficiently evaluate the conditional channel 
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on any constant size set of qubits W CV vial partial trace. This is useful in it- 
erative decoding schemes such as those used for quantum turbo-codes |OPT07a) 
and low density parity check codes [COTOSa] . In those cases, the conditional 
channel on W can only be evaluated approximately since it requires loopy QBP. 
The factor graph also enables exact evaluation of the logical error in a concate- 
nated block coding scheme |Pou06bj such as used in fault-tolerant protocols. 

7.2 Simulation of Many-Body Quantum Systems 

In statistical physics, the state of a many-body quantum system y is a Gibbs 
state pv = exp{—f3H) for some Hamiltonian H, where f3 — 1/T is the in- 
verse temperature. Typically, H is the sum of single and two-body interactions 
H = J2uev + Y.(u,w)eE on some graph G = {V, E). Understanding the 
correlations present in these states is a great challenge in theoretical physics. In 
this section, we describe how QBP can serve as an heuristic method to accom- 
plish this task approximately. For an account of the use of Belief Propagation 
in classical statistical mechanical systems, we refer the reader to the text of 
Mezard and Montanari [MM07aj . 

Defining /i„ = exp(— and v^.^^ — exp{—(3Hyw) gives an expression for 
Pv of the form of eq. ([M)) : 




(153) 



Thus, Pv is an oo-bifactor state. As mentioned in ^ a QBP*^°°^ algorithm can 
easily be formulated for this type of bifactor state, and still converge to the exact 
solutions of the corresponding inference problem when pv is a quantum Markov 
network and G is a tree. This requires replacing all matrix products Y[ by the 
commutative product in the defining equations of QBP(°°^ eqs. (|103m05|) . 
The proof of convergence Theorem 15.61 under these more general conditions 
follows essentially the same reasoning. 

To obtain a bifactor state that satisfies the commutation condition [vu-.v, Vw-.x] = 
0, it is possible to coarse-grain G in a way that the resulting interaction between 
coarse-grained neighbors commute. Consider for instance a one dimensional 
chain G — [V, E) with V = {u\u=i^...,n and E — {{u,u + 1)}u=i,...,n-i- We can 
construct a coarse-grained graph G by identifying all vertices 2u — 1 and 2u for 
u= 1, . . . , LyJ- The state pv is then an oo-bifactor state on G, with operators 

Pu = P2U-1 Q P2u Q V2u~l:2u (154) 
Vu:u+1 = V2u:2u+l, (155) 

satisfying [vu-.u+i, I'v.v+i] Thus, oo-bifactor states are commonplace in quantum 
many-body physics. Unfortunately, the convergence of the QBP algorithm in 
this case requires the state to be a quantum Markov network, which cannot be 
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tested directly in general. As we will now explain, it is often possible to reason- 
ably approximate a Gibbs state by an n-bifactor with finite n, and sometime 
even n — I. 

A simple way to obtain an n-bifactor state is to approximate © by for 
some large value of n. In the context of many-body physics, this is called a 
Trotter-Suzuki decomposition of the Gibbs state, and becomes more accurate 
as the ratio P/n decreases. The QBP^"^ algorithm can then be operated on 
this n-bifactor state, but its convergence again requires some independence con- 
dition that cannot be verified systematically. Alternatively, one can use the 
replica method described in section 16.31 and solve the inference problem exactly 
with QBP(^\ but with an increase in complexity exponential in n. The replica 
method is then reminiscent of the well known correspondence between quantum 
statistical mechanics in d dimensions and classical statistical mechanics in + 1 
dimensions, where the extra dimension represents inverse temperature. 

The 1-bifactor states also capture the correlations of some non-trivial quan- 
tum many-body systems. Valence bond solid (VBS) states were introduced 
in Ref. |AKLT87a[ IAKLT88aj as exact ground states (i.e. T = Gibbs 
states) of spin systems with interesting properties. Recent work has generalized 
these constructions to matrix product states (MPS) in one-dimension [FNW 92al 
IVid04a[ IVid06a' . and projected entangled-pair states (PEPS) for higher dimen- 
sions |VC 04a, SD V06a| . These form an important class of states for the descrip- 
tion of quantum many-body systems. For instance, density matrix renormal- 
ization group (DMRG) [Whi92aj — one of the most successful method for the 
numerical study of spin chains — is now understood as a variational method over 
MPS |OR95a| [DMNS98a| IVPCOib] . All these states are instances of 1-bifactor 
states. 



Figure 8: Projected entangled pair state on a two-dimensional square lattice. 
The vertices are associated to dashed circles. Each • — • represents a maximally 
entangled state of D dimension shared between neighboring vertices. A partial 
isometry Au : (C^)^" is applied at each vertex, where c„ is the degree of 

vertex u. 

For sake of simplicity, we will demonstrate this claim for one-dimensional 
MPS, but the same argument holds for higher dimensions. The MPS j^I') is a 
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pure state of a collection of N d-dimensional quantum systems displayed on a 
one dimensional lattice. Each vertex u is assigned two "virtual particles" Lu 
and Ru, where L and R stand for left and right (see Fig. [5] for a illustration of 
this construction in two-dimensions). Each of these particles are associated a 
Hilbert space TCl^, = = C^. Initially, the right particle of vertex w is in a 
maximally entangled state with the left particle of vertex u + 1; 1$)/^ 



\'^)f/. +1 where \a) are orthogonal basis vectors for C^. (The 

lattice can be closed to form a circle, in which case we identify iV+ 1 = 1.) The 
initial state is therefore |$o) — l*^)ij„uL„+i ■ 

To obtain the MPS, apply an operator Au'-H-l^® "Hr^ — > 

d D 

^'' = E E ^r-'^bX",/?! (156) 

J = l a,/3=l 

to each vertex of the lattice. The vectors |j) form an orthogonal basis for C*. 
The resulting state is 

N d 

|*)=(g)^«|'i>o)cx Tr(sfi3f ...B^") b-i,j2,...,jAr) (157) 

where the matrices i?^ are the submatrices of with matrix elements {Bl^)(^a,i3} - 

For the corresponding 1-bifactor state, the underlying graph G — (V, E) is 
also a one dimensional lattice V ^ {1,2, . . . , N} and E = {(1, 2), (2, 3), . . . , (iV- 
1, N)}. The Hilbert space associated to vertex u is Hu = (E) C^. As above, 
it is convenient to imagine that each vertex u is composed of two Z3-dimensional 
subsystems L„ and Then, up to a local isometry, the MPS of eq. (|157p can 
be expressed as a 1-bifactor state eq. with 

fiu = AiA^ and i^u-.v ^ \^){^\r^ul^- (158) 
Moreover, the operators Vu-.v mutually commute. To see the relation with 



eq. (|157p . note that the operators A" can be polar decomposed Au = Uuy AtAu — 
UufJ'u ■ The matrix Uu is a partial isometry 7i„ and 

l*X*l = ^ (U^u] |<i>oX<fo| f IT Ai] (159) 

n /^"^ ) (160) 
uev I 

yu.MYiVU (161) 




"Note that p,^ has rank < d. This can be seen straightforwardly by writing ji^ 
E,ti A:^>'''Ai^-* = \KtK\ where \K) = Ec/S^i"'" l":/3> G H„. 
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as claimed. 

Bifactor states are thus relevant to the description of quantum many-body 
systems. QBP can sometimes be used to efhciently compute correlation func- 
tions, but in general for spatial dimension larger than one, its convergence is not 
guaranteed. This is mainly due to the presence of small loops in the underlying 
graph. Partial solutions have been proposed to overcome this difficulty |VC04a| . 
and it is conceivable that techniques from loopy Belief Propagation and its gen- 
eralizations |YFW02a] will improve these algorithms. As in the classical case 
however, QBP may be more appropriate for the study of quantum systems on 
irregular sparse graphs, such as those encountered in classical spin glasses. 

Finally, it should be noted that the Markov conditions required to cer- 
tify the convergence of QBP — or the associated coarse-grained Markov con- 
ditions as explained in the previous section — are weaker than those typi- 
cally studied in statistical physics, namely the vanishing of connected cor- 
relation functions beyond some length scale. For pure quantum states, the 
two notions coincide and are equivalent to the absence of long-range entan- 
glement. At finite temperature however, the state is mixed and the vanishing 
of mutual information between vertices u and u + £ conditioned on vertices 
u + 1, . . . , u + £ — I eq. (I139P does not imply the absence of connected correla- 
tions (AuAu+e) = Tr - Tr Tr {pyAu+i). 

8 Related Work 

In this section, our approach to quantum Graphical Models and Belief Propaga- 
tion is compared to other proposals that have appeared in the literature. Firstly, 
Tucci has developed an approach to quantum Bayesian Networks |Tuc95aj . 
Markov Networks [Tuc07a| and Belief Propagation [Tuc98a| based on a diflFerent 
analogy between quantum theory and classical probability, namely the idea that 
probabilities should be replaced by complex valued amplitudes. Tucci's models 
require that these amplitudes should factorize according to conditions similar to 
those used in classical Graphical Models. One disadvantage of this is that the 
definition requires a fixed basis to be chosen for the system at each vertex of the 
graph, and the factorization condition for Bayesian Networks is not preserved 
under changes of this basis. In contrast, our definition of quantum conditional 
independence is based on an explicitly basis independent quantity, so it does 
not have this problem. Another difficulty with using amplitudes is that they are 
only well-defined for pure states, so that mixed states have to be represented as 
purifications on larger networks. In our approach, density operators are taken as 
primary, so mixed states can be represented without purification. On the other 
hand, the Tucci's definitions can easily accommodate unitary time evolution, 
whereas we do not have a general treatment of dynamics in our approach at the 
present time. A related definition of quantum Markov Networks, also based on 
amplitudes but without a development of the corresponding Belief Propagation 
algorithm, has been proposed by La Mura and Swiatczak |LMS07a] . to which 
similar comments apply. 



45 



There has also been work on Quantum Markov networks within the quantum 
probabihty hterature [LeiOlai IAF03a[ IAF03b| , although Belief Propagation has 
not been investigated in this literature. This is closer to the spirit of the present 
work, in the sense that it is based on the generalization of classical probabil- 
ity to a noncommutative, operator-valued probability theory. These works are 
primarily concerned with defining the Markov condition in such a way that it 
can be applied to systems with an infinite number of degrees of freedom, and 
hence an operator algebraic formalism is used. This is important for applica- 
tions to statistical physics because the thermodynamic limit can be formally 
defined as the limit of an infinite number of systems, but it is not so important 
for numerical simulations, since these necessarily operate with a finite number 
of discretized degrees of freedom. Also conditional independence is defined in a 
different way via quantum conditional expectations, rather than the approach 
based on conditional mutual information and conditional density operators used 
in the present work. Nevertheless, it seems likely that there are connections to 
our approach that should to be investigated in future work. 

Lastly, during the final stage of preparation of this manuscript, two related 
papers have appeared on the physics archive. An article by Laumann, Scardic- 
chio and Sondhi [LSS07aj used a QBP-like to solve quantum models on sparse 
graphs. Hastings (HasOTb] proposed a QBP algorithm for the simulation of 
quantum many-body systems based on ideas similar to the ones presented here. 
The connection between the two approaches, and in particular the application of 
the Lieb-Robinson bound |LR72a] to conditional mutual information, is worthy 
of further investigation. 

9 Conclusion 

In this paper, we have presented quantum Graphical Models and Belief Propa- 
gation based on the idea that quantum theory is a noncommutative, operator- 
valued, generalization of probability theory. Our main results are summarized 
on Fig. [9l We expect these methods to have significant applications in quan- 
tum error correction and the simulation of many-body quantum systems. We 
are currently in the process of implementing these algorithm numerically in both 
of these contexts. Belief Propagation based decoding of several types of quan- 
tum error correction codes has already been implemented quite successfully, e.g. 
on concatenated block codes |Pou06bj , turbo codes |OPT07a] , and sparse codes 
|COT05a] . However, for the noise models considered there, the corresponding 
bifactor states only involve commuting operators and thus the corresponding 
inference problem could be solved by means of a classical Belief Propagation 
algorithm. We conclude with several open questions suggested by this work. 

In the context of many-body physics, it would be interesting to relate the 
class of solutions obtained by QBP to other approximation schemes used in sta- 
tistical physics, much in the spirit of the work of Yedidia |Yed01a| in the classical 
setting. A related problem would be to understand how the different classes of 
bifactor states relate to each other. We suspect that when the Hilbert space 
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Figure 9: Relation between Markov Networks, Bifactor Networks, and 1-Bifactor 
Networks in a) quantum theory and b) classical probability theory. The hashed 
regions indicate the domain of convergence of the associated Belief Propagation 
algorithms. Figure a). Convergence of Belief Propagation on trees for Markov 
Networks is Theorem 15.61 and for 1-Bifactor states is Corollary 15.71 That all 
Markov Networks on trees are Bifactor states is Theorem l4.10l The existence of 
Bifactor Networks on trees that are not Markov Networks is given by Example 
l3.7l for n < cxD and the Heisenberg anti- ferromagnetic spin chain of Example 14.81 
for n = oo. Markov Networks on trees with cliques of size > 2 are generally 
not Bifactor Networks, c.f. Theorem 14.71 Figure b). That all classical Bifactor 
Networks are Markov Networks is the Hammersley-Clifford Theorem 14.11 and 
convergence of Belief Propagation on trees follows from Theorem 15.61 
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dimension at each vertex of the graph is held fixed, the n-bifactor states on that 
graph form a subset of the m-bifactor states when n < m. If that conjecture 
were true, it might lead to a family of approximation schemes converging to the 
correct solution. It would also reveal an interesting discrepancy between the 
classical and quantum settings. Classically, the problem of computing correla- 
tion functions in a disordered many-body system and the problem of decoding 
an error correction code are equivalent. If our conjecture holds true, in the 
quantum case the latter is simpler than the former. 

Whilst our definition of a quantum Markov Network is well motivated as 
a direct analog of a classical Markov Network, it does not seem to represent 
the most general class of states to which our Belief Propagation algorithms are 
applicable. In particular, in |J5T2] it was shown that QBP converges on trees 
for arbitrary bifactor states defined with respect to the ★ product. One reason 
for this discrepancy might be that the quantum conditional independence con- 
dition, Ip{U,W\X), only allows classical correlations to be mediated between 
U and W via X, i.e. puvjw is always separable, whereas the classical condi- 
tion Ip{U, W\X) is compatible with an arbitrary distribution P{U U W). This 
suggests that quantum conditional independence could be relaxed to a condi- 
tion that allows quantum correlations, i.e. entanglement, to be mediated by X, 
whilst still preserving the validity of Belief Propagation. It would be interesting 
to find a condition like this that also satisfies the graphoid axioms, so that it 
could naturally be represented on a graph. 

Nevertheless, quite apart from their application in Belief Propagation al- 
gorithms, the mathematical structures investigated in this work should be of 
interest in other areas of quantum information and computation. Firstly, the 
characterizations of quantum conditional independence in terms of conditional 
density operators given in H3. 31 should be useful, and indeed are currently being 
applied to the problem of pooling quantum states |LS07aj . Another interesting 
area of investigation would be the computational complexity of inference on 
quantum Markov Networks. In the classical case, it is fairly straightforward to 
find families of Markov Networks that encode instances of NP complete prob- 
lems, such as satisfiability or graph colorability. Therefore, one would expect 
to be able to encode problems that are similarly hard for quantum computers, 
i.e. complete for the complexity class QMA, as inference problems on quantum 
Markov Networks. This should be closely related to the quantum marginals 
problem, which has recently be proved to be QMA-complete |AGK07al IIra07a| . 

Finally, this work leaves open the question of fully characterizing quantum 
Markov Networks. The most generally applicable result given here is theorem 
14. 7[ which is a direct analog of one direction of the classical Hammersely Clifford 
theorem using the product. A full characterization would provide a converse 
to this theorem, i.e. a set of conditions on the operators in cq. (|92p . satisfied 
by the construction used in the proof, such that all states of this form are 
guaranteed to satisfy the Markov condition. Analogous theorems for the ★'^"^ 
products would also be useful. This work also leaves open the question of 
intersection for quantum conditional mutual information, i.e. whether S{U : 
W\X U y) = and S{U : Y\W U X) = imply S{U : W U Y\X) = for 
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strictly positive states. This result would imply that positive quantum Markov 
networks obey global Markov properties. 
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A A useful notation for probability distributions 
and density matrices 

A.l Probability Distributions 

In standard Kolmogorov probability theory for finite sample spaces, probabil- 
ities are given by a measure ji on a, sample space (f2,2^), where is a set 
of elementary events and 2^ is the power set, i.e. the set of all subsets of fi. 
Specifically, : 2^ — > [0, 1] and satisfies the axioms 

VA e 2", < A«(A) < 1 (162) 
IJ,{n) = 1 (163) 



d 

If Ai, A2, . . . Ad are disjoint sets in 2^^ then /i(U^^iAj) = y^/x(Aj) 

3=1 



(164) 



In particular, this imphes that /u(0) = and VAi, A2 £ 2^^, 



/x(Ai UA2) >m(Ai) 
At(Ai nAa) <Ai(Ai) 
If Ai C A2 then ^(Ai) <At(A2). 



(165) 
(166) 
(167) 



The conditional probability of A2, given Ai is defined to be 



Prob(A2|Ai) = 



/x(Ai n A2) 
/^(Ai) 



(168) 
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provided /i(Ai) ^ and is undefined otherwise. In particular, for any A S 2^, 
this means that Prob(A|0) is always undefined and that Prob(A|f2) = /i(A). 

Our notation for probability distributions over random variables works in an 
almost exactly opposite way to the Kolmogorov conventions, but is very conve- 
nient for the discussion of Graphical Models. For a random variable v that takes 
a finite number of possible values, write Piv) for the probability distribution of 
V. For definiteness, suppose that v takes integer values {1, 2, . . . rf}. Then, a sam- 
ple space can be associated with v by setting = {t; = 1, i; = 2, . . . , u = n}, 
and a measure \x : 2^" — > [0, 1] can be defined on this space. The notation 
P(v) is a stand in for /i(f = j) when j is an arbitrary imspecificd value. To 
give some precise examples of how this works, let / be a function with domain 
{1,2,..., d} and let 5 be a function with domain [0, 1]. Then, the expression 
g{P{v)) = f{v) is interpreted as Vj, g{jJ.{v = j)) = f{j), and the expression 
is interpreted as J2j9{l^{^ = K is straightforward to 

see how this generalizes to more complicated examples. 

Now consider the case of two random variables f , w for which we can set up 
sample spaces fly and fl^, as above. Joint probabilities are given by a measure 
fj, on the sample space (Cly x Qi„, 2^"^^""). The notation P{v,'w) stands for 
fi{v = j X w = k), where both j and k are arbitrary unspecified values. Note 
that 

l^{v = j X w = k) = i^{{v = j X flyj) n {fly xw = k)). (169) 

The notation P(v,w) can be made precise in the same way as the examples 
given above for a single variable, but two additional definitions arc worthy of 
note. Firstly, the marginal probability of v is written as P{v) = P{v, w) 
and this corresponds to the equation 

l_i{v=jxny,) = ^li{v=jxw = k). (170) 

k 

Secondly, the conditional probability of w given v is written as P(w|t;) = ^p^y) ^ 1 
which corresponds to 



Prob {fly X w = k\v = j X fly,) = 



jj, {{V = j X fly,) n {fly XW = k)) 
U(V = j X flyj) 

IX{V = j X fly,) ■ 



The generalization of this to arbitrary numbers of random variables is straight- 
forward. 

The present notation can be extended to a set of random variables V = 
{vi,V2, . . . , Wat}, where Vj is a random variable taking values in {1, 2, . . . , dj}. 
Consider the joint probability distribution of an arbitrary subset U C V. Let 
/ = ?2, • • • , *m} be the index set of U, i.e. the subset of {1,2,...,A^} 
consisting of the indices of the w^'s that arc contained in U. Then define 
P{U) = P{vi^,Vi^, . . . ,Vi^). This imphes that P(0) = 1, which is opposite 
to the Kolmogorov convention for events, but recall that here is an empty set 
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of random variables rather than an event in a sample space. To see this, note 
that the expression P{U) may be read as meaning that the variables in U are 
constrained to take particular values, whilst the variables inV — U, the relative 
complement of V in V, may take any value. Thus -P(0) is the probability of 
the event corresponding to no constraints, i.e. the entire sample space. More 
precisely, if we definei^ = {fci, fc2, . . . , /cjv-Af } to be the index set oi V — U and 
let ji, j2, . . . , jm be particular instantiations of , , . . . , Vi^^^ , then P{U) cor- 
responds to ^(t>ii = ji xvi^ = j2 X . . . X Vi^j ^ Jm X ^Vk^ X Q,^^^ X ... X Q,k,^_„,). 
Thus, for J7 = we have P(0) = fi{^v-^ x ily^ x . . . x r2„„) = 1 via the standard 
Kolmogorov axioms. 

All the usual set theoretic notions can be applied at the level of random 
variables, and it is straightforward to verify that the following relations hold for 



Conditional probabilities P{W\U) are only well-defined for disjoint subsets, so 
P(W\V) is always undefined and P(W|0) = PiW). 

Finally, note that this notation introduces an ambiguity for singleton sets 
{v}, since P{v) and P{{v}) denote the same object. These are used interchange- 
ably and set theoretic operations like U U {v} are denoted UL)v when this does 
not cause ambiguity. 

A. 2 Density Matrices 

For quantum theory, the corresponding notation is obtained by replacing ran- 
dom variables v with finite-dimensional Hilbert spaces 7iv and P{v) with a den- 
sity matrix acting on Tiy . The density matrix is referred to as the state of 
system v, with the fact that it is defined on a corresponding Hilbert space Tiy 
left implicit. If we have a set of iV quantum systems V = {ui, V2, ■ ■ ■ , wjv}, 
then the state pv is defined on the Hilbert space Hy^ . . . O . For 
an arbitrary subset U C the state pu is defined to be the partial trace 
of Pv over all the systems m V — U. With this convention, is associated 
with the trivial Hilbert space C, so that p^ = 1. It is convenient to suppress 
tensor products with identity operators in order to equate operators acting on 
different subsets of V. Explicitly, ii U,W C V and Ajj and Bw are opera- 
tors acting on Tiu and Tiw respectively then Au = Byy is defined to mean 
Au ® Iw-{unw) = Bw ® Iu-{Ur\W)- Generally, identity operators are omitted 
in this way unless their presence is required to clarify an argument. 



Lemma B.l. Let V be a collection of quantum systems with Hilbert space 
Ti-v = ®y^v^^ '^'^'^ operator on TLv ■ Let \a)^ € Tiv be a set 



all U,W CV 



p{uyjw) < P{U) 
p{ur\W) > p{u) 

liU then P{U) < P{W). 



(172) 
(173) 
(174) 



B Proof of Theorem 14.7 
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of pure states, where \a)^ may be a different state for each v, and for U C V 
define \a)ij — ®ug(7 lo^v)- For all U CV define 

®Iv-u. (175) 
where V — U denotes the relative complement of U in V , and 

Ku= E (-l)""^^'^!^: (176) 

WQU 

where \ ■ \ denotes the order, i.e. number of elements contained in, a set. Then, 

Hv=J2 (177) 

ucv 

Proof. Consider the double sum expression obtained by substituting eq. (|176p 
into the right hand side of cq. (I177|) . 

E E (-1)'^"^'^^. (178) 
ucv wcu 

Note that the coefficient of Jw in this expression is 

E (-1)'^-^'= E (179) 

{U:WCUCV} XC{V-W) 

If = then is the only subset of — W^, so the last sum reduces to (—1)" = 
1. The corresponding term in eq. (|178p is just Hy, so it just remains to prove 
that all the other terms sum to 0. For W ^ V, choose an arbitrary element v G 
{V - W). Let X = {X C{y - W)\v ^ X} and let X ^ {X C {V - W)\v e X}. 
For each X G X, define X G X via X = X IJ {v}. This correspondence is a 
bijection, so exactly half of the subsets oi V — W contain v and the other half 
do not contain v. Further, if X G X has even order then X has odd order, and if 
X G X has odd order then X has even order. Thus, there are an equal number 
of odd and even order subsets oi V — W, so the right hand side of eq. (|179p is 
zero. □ 

Lemma B.2. Let V be a collection of quantum systems with Hilbert space 
Tiv = ®„gy'Wt) O'Ti'd let Hy be an operator on TLv ■ Let \a)^ G Tiv be a set of 
pure states. For nonempty U Q V define Kjj as in eq. (jl76p and let u G U . 
Then 

{a\^Ku\a)^ = (180) 

Proof. Let W (^U.liu(^W then {a\^ Jw \a)u = ("ly-w |Q:)y_vi/ Iv-(wij{u})- 
Also, 

("L Jwij{u} \o)u = ("lv-(ivu{«}) \a)v~{wvj{u}) Iv-(wu{l.^) 

= {a\y_^Hv\a)y_^® Iv-(wvj{ii}) (182) 

- «Jw\a),^. (183) 
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From the same argument that was used in lemma IB.ll the element u divides 
the subsets of U into pairs, i.e. those that don't contain u and those obtained 
by adding u to such a set. As shown above, the operator obtained by projecting 
the J operator onto |q;)^ is the same for each such pair of subsets, but they enter 
into eq. (|176p with opposite sign and so the corresponding terms in (a|„ Kjj \a)^ 
cancel. □ 

Proof of Theorem \4. 7\ Apply lemma IB.ll with Hy = log pv and set ajj = 
exp{Kij) for all U C V. Rewriting eq. (I177P in terms of these operators gives 

Pv = Qucvcru- (184) 

It remains to show that ajj is the identity whenever [/ ^ £, which is equivalent 
to proving that Kjj = 0. 

For any ?7 ^ £, wc can find two vertices u,t £ U that are not connected by 
an edge. In particular, this means that t ^ n{u). Then, the Markov condition, 
/ {{u} : V — {{u} U —n{u)) \n{u)), implies that 

l0gPtt|y-{«} = logP„|„(„) ® /v_({„}Un(«)) (185) 
= ^Og Pu\n(u) Iv~{{u}u{t}Un{u)) It- (186) 

Now, let U ^ {W C U\u ^ W} and let il ^ {W C U\u e W}. As before, every 
G it is in one-to-one correspondence with a,W € U. defined hy W — WU {u}, 
and so eq. ()176p may be rewritten as 

Ku= ^(-I)I^-^I(Jh^- J^). (187) 
weu. 

Next, consider a particular W and the corresponding term Jw — Jw- Using the 
standard rules of conditional density operators, 

Jw = {a\y_w^ogpv\a)Y_w ® ly-w (188) 
= {oi\v-w^^^Pu\v-u\(^)v-w ® ly-w 

+ {c^\v-w logPv-u ® In \a)v-w ® Iv-w (189) 
= {<^\v-w^'^^P^\v-u\a)y_^^ ® Iv-w 

+ {a\v-{wu{u}) logPv-u \a)v{wu{u}) ® Iv~(wvj{u}) ® In- (190) 

Similarly, J-^ may be written as 

Jw = {a\v-w^°?>Pv\a)v-w®Jv-w (191) 
= {a-\v-(wu{u})^^?>Pv\a)v-(wu{u})® Jv-(wv{u}) (192) 

= ("ly-(Vl/U{n}) l0gP«|V-« \'^)v-{WVJ{u}) ® Jv-{wu{ii}) 

+ {^\v-{wu{u}) logPi/-« ® In \o)v-{wu{u}) ® Iv~(WVJ{u}) (193) 

= ('^ly-(wu{«}) logP«|v-« \^)v-(wu{u}) ® Jv-{wu{u}) 

+ {a\v-~(wu{u}) logPv-« \a)v-(wu{u}) ® Jv-{wyj{u}) ® /n.(194) 
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The last terms (|190l) and (I194p are identical, so they cancel in Jw — J^r- There- 
fore, Jw — Jw is just the difference of l|190[) and (|194p . The remainder of the 
proof show that {a\^ Jw — Jw \'^)t^^t ~ — Jw- From this it follows that 
{a\^Kij |a)j ® It = Kjj, but lemma lR2l shows that {a\^Kij \a)^ — 0, so this is 
enough to complete the proof. 

There are two cases to deal with, either t ^ W ot t ^ W. When t ^ W, 
both V — W and V — {W U {u}) contain t. The effect of projecting out \a)^ 
on terms (|190l) and (|194p is to replace ly-w and Iv-(wu{u}) with Iv-(wu{t}) 
and Iv-(wu{u}u{t}) respectively, but then tensoring with /( restores the original 
identity operator so both terms are unaffected. In the case where t S W, we 
make use of the Markov condition in the form of eq. (|186p . The important point 
is that Pu\v~-u is of the form Ty^t h, so projecting out \at) and retensoring 
with It again has no effect on the terms (|190p and (|194p . □ 
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